Tracking Data Updates
in Real-time with Change
Data Capture
Tzach Livyatan, ScyllaDB VP of Product
Piotr Jastrzębski, ScyllaDB Software Team Lead
2
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra
and Amazon DynamoDB
+ 10X the performance & low tail latency
+ Open Source, Enterprise and Cloud options
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA, USA; Herzelia, Israel;
Warsaw, Poland
About ScyllaDB
Presenters
Tzach Livyatan, VP of Product
Tzach has a 15 year career in development, system engineering and product
management. He has worked in the Telecom domain, focusing on carrier grade
systems, signalling, policy and charging applications for Oracle and others.
3
Piotr Jastrzębski, Software Development Team Lead
Piotr is a software engineer with over 12 years of experience in software
development. He worked for a hedge fund called Two Sigma on alpha/system models
execution infrastructure and for Google on Android Java Runtime and Google Search
on Android.
Agenda
4
+ About ScyllaDB
+ Introducing Scylla Change Data Capture - what is it good for ?
+ Using CDC - how to use it ?
+ Roadmap - when can I get it?
+ QA
Introducing Scylla
Change Data
Capture (CDC)
5
Change Data Capture - CDC
Consumable modification record for one or more tables in the database
+ Capture changes (write/update/delete)
+ Asynchronously readable by a consumer
+ Table level Granularity
+ Highly Available
+ Persistence
+ (Eventually) Consistence
Feeding Microservices
Kafka
CDC
Stream
Fraud
Detection
Data Lake
Real Time
Analysis
Search
Low Coupling Replication
Agent
Agent
Multi DC
Deployment
Multi Cluster
Deployment
CDC
Stream
Low Coupling Replication
Multi DC CDC base Sync
Granularity Key Space Table
Consistency Strong / Eventual Weak
Scylla Cluster Same Cluster Two Distinct Clusters
Data Same on all replicas Can be different, manipulate
by the agent
PowerGuarantees
Low Coupling Use Cases
+ Active / Replication deployment
+ Clusters with different Scylla versions / editions / HW
+ Clusters with different TTL for the data.
+ Hot cluster: TTL = 1 day
+ Cold cluster: TTL = 1 month
+ Weak Consistency requirements
Poll Question
Using Scylla CDC
Table for Examples
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
department last_name first_name age level
Production Brown Robert 55 4
Production Johnson Alice 33 3
Production Smith John 35 2
Marketing Green Elizabeth 50 5
Marketing Miller Paul 21 1
CDC Modes
+ Delta
+ Preimage
+ Postimage
CDC Modes - Delta
INSERT INTO employees (department, last_name, first_name, age, level) VALUES ('Production', 'Smith', 'John', 35, 2);
department last_name first_name age level
Production Smith John 35 2
department last_name first_name age level
Production Smith John 35 2
CDC Modes - Delta
INSERT INTO employees (department, last_name, first_name, age) VALUES ('Marketing', 'Green', 'Elizabeth', 50);
department last_name first_name age level
Production Smith John 35 2
Marketing Green Elizabeth 50
department last_name first_name age level
Production Smith John 35 2
Marketing Green Elizabeth 50
CDC Modes - Delta
UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 2
Marketing Green Elizabeth 50
Production Smith John 3
department last_name first_name age level
Production Smith John 35 3
Marketing Green Elizabeth 50
CDC Modes - Delta
DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 3
Marketing Green Elizabeth 50
Production Smith John
department last_name first_name age level
Production Smith John 3
Marketing Green Elizabeth 50
CDC Modes - Delta
DELETE FROM employees WHERE department = 'Marketing' AND last_name = 'Green' AND first_name = 'Elizabeth';
department last_name first_name age level
Production Smith John 3
Marketing Green Elizabeth 50
Marketing Green Elizabeth
CDC Modes - Preimage
INSERT INTO employees (department, last_name, first_name, age, level) VALUES ('Production', 'Smith', 'John', 35, 2);
department last_name first_name age leveldepartment last_name first_name age level
Production Smith John 35 2
Production Smith John 35 2
CDC Modes - Preimage
UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 2
Production Smith John 2
department last_name first_name age level
Production Smith John 35 3
Production Smith John 3
CDC Modes - Preimage
DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 3
Production Smith John 35
department last_name first_name age level
Production Smith John 3
Production Smith John
CDC Modes - Preimage
DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 3
Production Smith John 35 3
Production Smith John
CDC Modes - Postimage
UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 2
Production Smith John 35 3
department last_name first_name age level
Production Smith John 35 3
Production Smith John 3
CDC Modes - Postimage
DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 3
Production Smith John 3
department last_name first_name age level
Production Smith John 3
Production Smith John
CDC Modes - Postimage
DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
department last_name first_name age level
Production Smith John 35 3
Production Smith John
Production Smith John
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC Log
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CDC Log
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CDC Log
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CDC Log
CDC Log
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CDC Log
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CDC Log
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC Log
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
UPDATE employees SET level = 3
WHERE department = 'Production'
AND last_name = 'Smith'
AND first_name = 'John';
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
INSERT INTO employees
(department, last_name, first_name, age, level)
VALUES
('Production', 'Smith', 'John', 35, 2);
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
DELETE FROM employees
WHERE department = 'Production'
AND last_name = 'Smith'
AND first_name = 'John';
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
DELETE FROM employees
WHERE department = 'Production';
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
CDC log0 PREIMAGE
1 UPDATE
2 INSERT
3 ROW DELETE
4 PARTITION DELETE
5 RANGE DELETE INCLUSIVE LOWER BOUND
6 RANGE DELETE EXCLUSIVE LOWER BOUND
7 RANGE DELETE INCLUSIVE UPPER BOUND
8 RANGE DELETE EXCLUSIVE UPPER BOUND
9 POSTIMAGE
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
DELETE FROM employees
WHERE department = 'Production'
AND last_name >= 'A'
AND last_name < 'C';
CDC Log
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
cdc$operation tinyint, cdc$ttl bigint,
department text,
first_name text,
last_name text,
age int, cdc$deleted_age boolean,
level int, cdc$deleted_level boolean,
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
department text,
last_name text,
first_name text,
age int,
level int,
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
scylla.yaml:
experimental_features:
- cdc
Command line: --experimental
CREATE TABLE company.employees ( ...
) WITH cdc = {'enabled' : true};
ALTER TABLE company.employees WITH cdc = {'enabled' :
true};
WITH cdc = {
'enabled' : true,
'preimage' : true,
'postimage' : true,
'ttl' : 1000
};
How to Enable CDC
INSERT INTO employees (department, last_name, first_name, level) VALUES ('Production', 'Smith', 'John', 2);
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time 54b28d54-69cb-11ea-6541-6fa9f3f294ae
cdc$batch_seq_no 0
cdc$operation 2
cdc$ttl null
department Production
last_name Smith
first_name John
age null
level 2
cdc$deleted_age null
cdc$deleted_level null
UPDATE employees SET age = 35 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time b82e2cb0-69cd-11ea-6a10-8c39c8f7dc3a
cdc$batch_seq_no 0
cdc$operation 1
cdc$ttl null
department Production
last_name Smith
first_name John
age 35
level null
cdc$deleted_age null
cdc$deleted_level null
DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time c8f5b55e-69cd-11ea-68c1-e036079d1ce4
cdc$batch_seq_no 0
cdc$operation 1
cdc$ttl null
department Production
last_name Smith
first_name John
age null
level null
cdc$deleted_age true
cdc$deleted_level null
DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time e75e6612-69cd-11ea-dd3e-2e150dc8859c
cdc$batch_seq_no 0
cdc$operation 3
cdc$ttl null
department Production
last_name Smith
first_name John
age null
level null
cdc$deleted_age null
cdc$deleted_level null
DELETE FROM employees WHERE department = 'Production';
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time 0a454bbe-69ce-11ea-31f2-98b4c3f08688
cdc$batch_seq_no 0
cdc$operation 4
cdc$ttl null
department Production
last_name null
first_name null
age null
level null
cdc$deleted_age null
cdc$deleted_level null
DELETE FROM employees WHERE department = 'Production' AND last_name >= 'A' AND last_name < 'C';
cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3
cdc$time 447dd18e-69ce-11ea-6ff4-81708ddd348a
cdc$batch_seq_no 0 1
cdc$operation 5 8
cdc$ttl null null
department Production Production
last_name A C
first_name null null
age null null
level null null
cdc$deleted_age null null
cdc$deleted_level null null
CDC Write
RF=3
CL = QUORUM (2)
CDC Streams
CREATE TABLE company.employees_scylla_cdc_log (
cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int,
...
PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no)
)
CREATE TABLE company.employees (
...
PRIMARY KEY (department, last_name, first_name)
) WITH cdc = {/* CDC parameters go here */};
Benchmark Results
CDC disabled Delta Delta + Preimage
Throughput (ops) 78,847 61,851 ( -21.55%) 33,358 ( -57.69%)
Mean latency (ms) 1.3 1.6 (+23.07%) 3.0 (+130.76%)
99th percentile latency (ms) 3.6 4.7 (+30.55%) 9.3 (+158.33%)
Take Away
CDC in Scylla
+ Easy to integrate and consume
+ Plain CQL tables
+ Robust
+ Replicated in same way as normal data
+ Reasonable overhead
+ Coalesced writes to same replica ranges
+ Does not overflow if consumer fails to act
+ Data is TTL:ed
Comparison Chart
Cassandra DynamoDB MongoDB Scylla
Consumer location on-node off-node off-node off-node
Replication duplicated deduplicated deduplicated deduplicated
Deltas yes no partial yes
Pre-image no yes no optional
Post-image no yes yes optional
Slow consumer
reaction
Table stopped
Consumer loses
data
Consumer loses
data
Consumer loses
data
Ordering no yes yes yes
Try Scylla CDC Now!
+ Download Scylla Nightly Docker and use --experimental
https://hub.docker.com/r/scylladb/scylla-nightly
+ Use and follow the docs:
https://docs.scylladb.com/using-scylla/cdc/
55
Q&A
Tzach Livyatan Piotr Jastrzębski
piotr@scylladb.com
tzach@scylladb.com
Stay in touch
United States
545 Faber Place
Palo Alto, CA 94303
Israel
11 Galgalei Haplada
Herzelia, Israel
www.scylladb.com
@scylladb
Thank you

Tracking Data Updates in Real-time with Change Data Capture

  • 1.
    Tracking Data Updates inReal-time with Change Data Capture Tzach Livyatan, ScyllaDB VP of Product Piotr Jastrzębski, ScyllaDB Software Team Lead
  • 2.
    2 + The Real-TimeBig Data Database + Drop-in replacement for Apache Cassandra and Amazon DynamoDB + 10X the performance & low tail latency + Open Source, Enterprise and Cloud options + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw, Poland About ScyllaDB
  • 3.
    Presenters Tzach Livyatan, VPof Product Tzach has a 15 year career in development, system engineering and product management. He has worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications for Oracle and others. 3 Piotr Jastrzębski, Software Development Team Lead Piotr is a software engineer with over 12 years of experience in software development. He worked for a hedge fund called Two Sigma on alpha/system models execution infrastructure and for Google on Android Java Runtime and Google Search on Android.
  • 4.
    Agenda 4 + About ScyllaDB +Introducing Scylla Change Data Capture - what is it good for ? + Using CDC - how to use it ? + Roadmap - when can I get it? + QA
  • 5.
  • 6.
    Change Data Capture- CDC Consumable modification record for one or more tables in the database + Capture changes (write/update/delete) + Asynchronously readable by a consumer + Table level Granularity + Highly Available + Persistence + (Eventually) Consistence
  • 7.
  • 8.
    Low Coupling Replication Agent Agent MultiDC Deployment Multi Cluster Deployment CDC Stream
  • 9.
    Low Coupling Replication MultiDC CDC base Sync Granularity Key Space Table Consistency Strong / Eventual Weak Scylla Cluster Same Cluster Two Distinct Clusters Data Same on all replicas Can be different, manipulate by the agent PowerGuarantees
  • 10.
    Low Coupling UseCases + Active / Replication deployment + Clusters with different Scylla versions / editions / HW + Clusters with different TTL for the data. + Hot cluster: TTL = 1 day + Cold cluster: TTL = 1 month + Weak Consistency requirements
  • 11.
  • 12.
  • 13.
    Table for Examples CREATETABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; department last_name first_name age level Production Brown Robert 55 4 Production Johnson Alice 33 3 Production Smith John 35 2 Marketing Green Elizabeth 50 5 Marketing Miller Paul 21 1
  • 14.
    CDC Modes + Delta +Preimage + Postimage
  • 15.
    CDC Modes -Delta INSERT INTO employees (department, last_name, first_name, age, level) VALUES ('Production', 'Smith', 'John', 35, 2); department last_name first_name age level Production Smith John 35 2 department last_name first_name age level Production Smith John 35 2
  • 16.
    CDC Modes -Delta INSERT INTO employees (department, last_name, first_name, age) VALUES ('Marketing', 'Green', 'Elizabeth', 50); department last_name first_name age level Production Smith John 35 2 Marketing Green Elizabeth 50 department last_name first_name age level Production Smith John 35 2 Marketing Green Elizabeth 50
  • 17.
    CDC Modes -Delta UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 2 Marketing Green Elizabeth 50 Production Smith John 3 department last_name first_name age level Production Smith John 35 3 Marketing Green Elizabeth 50
  • 18.
    CDC Modes -Delta DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 3 Marketing Green Elizabeth 50 Production Smith John department last_name first_name age level Production Smith John 3 Marketing Green Elizabeth 50
  • 19.
    CDC Modes -Delta DELETE FROM employees WHERE department = 'Marketing' AND last_name = 'Green' AND first_name = 'Elizabeth'; department last_name first_name age level Production Smith John 3 Marketing Green Elizabeth 50 Marketing Green Elizabeth
  • 20.
    CDC Modes -Preimage INSERT INTO employees (department, last_name, first_name, age, level) VALUES ('Production', 'Smith', 'John', 35, 2); department last_name first_name age leveldepartment last_name first_name age level Production Smith John 35 2 Production Smith John 35 2
  • 21.
    CDC Modes -Preimage UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 2 Production Smith John 2 department last_name first_name age level Production Smith John 35 3 Production Smith John 3
  • 22.
    CDC Modes -Preimage DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 3 Production Smith John 35 department last_name first_name age level Production Smith John 3 Production Smith John
  • 23.
    CDC Modes -Preimage DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 3 Production Smith John 35 3 Production Smith John
  • 24.
    CDC Modes -Postimage UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 2 Production Smith John 35 3 department last_name first_name age level Production Smith John 35 3 Production Smith John 3
  • 25.
    CDC Modes -Postimage DELETE age FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 3 Production Smith John 3 department last_name first_name age level Production Smith John 3 Production Smith John
  • 26.
    CDC Modes -Postimage DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; department last_name first_name age level Production Smith John 35 3 Production Smith John Production Smith John
  • 27.
    CREATE TABLE company.employees( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC Log CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) )
  • 28.
    CDC Log CREATE TABLEcompany.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) )
  • 29.
    CDC Log CREATE TABLEcompany.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) )
  • 30.
    CREATE TABLE company.employees( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CDC Log
  • 31.
    CDC Log CREATE TABLEcompany.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 32.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CDC Log CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 33.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CDC Log CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 34.
    CDC Log CREATE TABLEcompany.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 35.
    CREATE TABLE company.employees( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) 0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE
  • 36.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE UPDATE employees SET level = 3 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
  • 37.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE INSERT INTO employees (department, last_name, first_name, age, level) VALUES ('Production', 'Smith', 'John', 35, 2);
  • 38.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE DELETE FROM employees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John';
  • 39.
    CREATE TABLE company.employees_scylla_cdc_log( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE DELETE FROM employees WHERE department = 'Production';
  • 40.
    CREATE TABLE company.employees( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */}; CDC log0 PREIMAGE 1 UPDATE 2 INSERT 3 ROW DELETE 4 PARTITION DELETE 5 RANGE DELETE INCLUSIVE LOWER BOUND 6 RANGE DELETE EXCLUSIVE LOWER BOUND 7 RANGE DELETE INCLUSIVE UPPER BOUND 8 RANGE DELETE EXCLUSIVE UPPER BOUND 9 POSTIMAGE CREATE TABLE company.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) DELETE FROM employees WHERE department = 'Production' AND last_name >= 'A' AND last_name < 'C';
  • 41.
    CDC Log CREATE TABLEcompany.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, cdc$operation tinyint, cdc$ttl bigint, department text, first_name text, last_name text, age int, cdc$deleted_age boolean, level int, cdc$deleted_level boolean, PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( department text, last_name text, first_name text, age int, level int, PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 42.
    scylla.yaml: experimental_features: - cdc Command line:--experimental CREATE TABLE company.employees ( ... ) WITH cdc = {'enabled' : true}; ALTER TABLE company.employees WITH cdc = {'enabled' : true}; WITH cdc = { 'enabled' : true, 'preimage' : true, 'postimage' : true, 'ttl' : 1000 }; How to Enable CDC
  • 43.
    INSERT INTO employees(department, last_name, first_name, level) VALUES ('Production', 'Smith', 'John', 2); cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time 54b28d54-69cb-11ea-6541-6fa9f3f294ae cdc$batch_seq_no 0 cdc$operation 2 cdc$ttl null department Production last_name Smith first_name John age null level 2 cdc$deleted_age null cdc$deleted_level null
  • 44.
    UPDATE employees SETage = 35 WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time b82e2cb0-69cd-11ea-6a10-8c39c8f7dc3a cdc$batch_seq_no 0 cdc$operation 1 cdc$ttl null department Production last_name Smith first_name John age 35 level null cdc$deleted_age null cdc$deleted_level null
  • 45.
    DELETE age FROMemployees WHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time c8f5b55e-69cd-11ea-68c1-e036079d1ce4 cdc$batch_seq_no 0 cdc$operation 1 cdc$ttl null department Production last_name Smith first_name John age null level null cdc$deleted_age true cdc$deleted_level null
  • 46.
    DELETE FROM employeesWHERE department = 'Production' AND last_name = 'Smith' AND first_name = 'John'; cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time e75e6612-69cd-11ea-dd3e-2e150dc8859c cdc$batch_seq_no 0 cdc$operation 3 cdc$ttl null department Production last_name Smith first_name John age null level null cdc$deleted_age null cdc$deleted_level null
  • 47.
    DELETE FROM employeesWHERE department = 'Production'; cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time 0a454bbe-69ce-11ea-31f2-98b4c3f08688 cdc$batch_seq_no 0 cdc$operation 4 cdc$ttl null department Production last_name null first_name null age null level null cdc$deleted_age null cdc$deleted_level null
  • 48.
    DELETE FROM employeesWHERE department = 'Production' AND last_name >= 'A' AND last_name < 'C'; cdc$stream_id 0x6f06e163e099df1b8545901cd23663e3 cdc$time 447dd18e-69ce-11ea-6ff4-81708ddd348a cdc$batch_seq_no 0 1 cdc$operation 5 8 cdc$ttl null null department Production Production last_name A C first_name null null age null null level null null cdc$deleted_age null null cdc$deleted_level null null
  • 49.
  • 50.
    CDC Streams CREATE TABLEcompany.employees_scylla_cdc_log ( cdc$stream_id blob, cdc$time timeuuid, cdc$batch_seq_no int, ... PRIMARY KEY (cdc$stream_id, cdc$time, cdc$batch_seq_no) ) CREATE TABLE company.employees ( ... PRIMARY KEY (department, last_name, first_name) ) WITH cdc = {/* CDC parameters go here */};
  • 51.
    Benchmark Results CDC disabledDelta Delta + Preimage Throughput (ops) 78,847 61,851 ( -21.55%) 33,358 ( -57.69%) Mean latency (ms) 1.3 1.6 (+23.07%) 3.0 (+130.76%) 99th percentile latency (ms) 3.6 4.7 (+30.55%) 9.3 (+158.33%)
  • 52.
  • 53.
    CDC in Scylla +Easy to integrate and consume + Plain CQL tables + Robust + Replicated in same way as normal data + Reasonable overhead + Coalesced writes to same replica ranges + Does not overflow if consumer fails to act + Data is TTL:ed
  • 54.
    Comparison Chart Cassandra DynamoDBMongoDB Scylla Consumer location on-node off-node off-node off-node Replication duplicated deduplicated deduplicated deduplicated Deltas yes no partial yes Pre-image no yes no optional Post-image no yes yes optional Slow consumer reaction Table stopped Consumer loses data Consumer loses data Consumer loses data Ordering no yes yes yes
  • 55.
    Try Scylla CDCNow! + Download Scylla Nightly Docker and use --experimental https://hub.docker.com/r/scylladb/scylla-nightly + Use and follow the docs: https://docs.scylladb.com/using-scylla/cdc/ 55
  • 56.
  • 57.
  • 58.
    United States 545 FaberPlace Palo Alto, CA 94303 Israel 11 Galgalei Haplada Herzelia, Israel www.scylladb.com @scylladb Thank you