SlideShare a Scribd company logo
Event Sourcing with Cassandra
Luke Tillman
Technical Evangelist
@LukeTillman
• Evangelist with a
focus on
Developers
– Long-time
Developer on
RDBMS (lots of
.NET)
• I still write a lot of
code, but now I also
do a lot of teaching
and speaking
Who are you?
2
A Quick Recap of Event Sourcing
3
Persistence with Event Sourcing
• Instead of keeping the
current state, keep a journal
of all the deltas (events)
• Append only (no UPDATE or
DELETE)
• We can replay our journal of
events to get the current
state
4
Shopping Cart (id = 1345)
user_id= 4762
created_on= 7/10/2015…
Cart Created
item_id= 7621
quantity= 1
price= 19.99
Item Added
item_id= 9134
quantity= 2
price= 16.99
Item Added
Item Removed item_id= 7621
Qty Changed
item_id= 9134
quantity= 1
Event Sourcing in Practice
• Typically two kinds of storage:
– Event Journal Store
– Snapshot Store
• A history of how we got to the
current state can be useful
• We've also got a lot more data
to store than we did before
5
Shopping Cart (id = 1345)
user_id= 4762
created_on= 7/10/2015…
Cart Created
item_id= 7621
quantity= 1
price= 19.99
Item Added
item_id= 9134
quantity= 2
price= 16.99
Item Added
Item Removed item_id= 7621
Qty Changed
item_id= 9134
quantity= 1
Why use Cassandra for Event Sourcing?
• Transactional (OLTP) Workload
• Sequentially written, immutable data
– Looks a lot like time series data
• Easy to scale out to capture more events
6
Event Sourcing Example: Akka Persistence
7
Akka Persistence Journal API Summary
• Write Method
– For a given actor, write a group
of messages
• Delete Method
– For a given actor, permanently
or logically delete all messages
up to a given sequence number
• Read Methods
– For a given actor, read back all
the messages between two
sequence numbers
– For a given actor, read the
highest sequence number that's
been written
8
An Event Journal in Cassandra
Data Modeling for Reads and Writes
9
A Simple First Attempt
• Use persistence_id as partition key
– all messages for a given persistence Id
together
• Use sequence_number as clustering
column
– order messages by sequence number
inside a partition
• Read all messages between two
sequence numbers
• Read the highest sequence number
10
CREATE TABLE messages (
persistence_id text,
sequence_number bigint,
message blob,
PRIMARY KEY (
persistence_id, sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND sequence_number >= ?
AND sequence_number <= ?;
SELECT sequence_number FROM messages
WHERE persistence_id = ?
ORDER BY sequence_number DESC LIMIT 1;
A Simple First Attempt
• Write a group of messages
• Use a Cassandra Batch statement to
ensure all messages (success) or no
messages (failure) get written
• What's the problem with this data
model (ignoring implementing deletes
for now)?
11
CREATE TABLE messages (
persistence_id text,
sequence_number bigint,
message blob,
PRIMARY KEY (
persistence_id, sequence_number)
);
BEGIN BATCH
INSERT INTO messages ... ;
INSERT INTO messages ... ;
INSERT INTO messages ... ;
APPLY BATCH;
Unbounded Partition Growth
• Cassandra has a hard limit of 2
billion cells in a partition
• But there's also a practical limit
– Depends on row/cell data size, but
likely not more than millions of rows
12
Journal
INSERT INTO messages ...
persistence_id=
'57ab...'
seq_nr=
1
seq_nr=
2
message=
0x00...
message=
0x00...
∞?
Fixing the Unbounded Partition Growth Problem
• General strategy: add a column to
the partition key
– Compound partition key
• Can be data that's already part of
the model, or a "synthetic" column
• Allow users to configure a partition
size in the plugin
– Partition Size = number of rows per
partition
– This should not be changeable once
messages have been written
• Partition number for a given
sequence number is then easy to
calculate
– (seqNr – 1) / partitionSize
(100 – 1) / 100 = partition 0
(101 – 1) / 100 = partition 1
13
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
Fixing the Unbounded Partition Growth Problem
• Read all messages between two
sequence numbers
• Read the highest sequence number
14
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
SELECT sequence_number FROM messages
WHERE persistence_id = ?
AND partition_number = ?
ORDER BY sequence_number DESC LIMIT 1;
(repeat until we reach sequence number or run out of partitions)
(repeat until we run out of partitions)
Fixing the Unbounded Partition Growth Problem
• Write a group of messages
• A Cassandra Batch statement
might now write to multiple
partitions (if the sequence numbers
cross a partition boundary)
• Is that a problem?
15
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
BEGIN BATCH
INSERT INTO messages ... ;
INSERT INTO messages ... ;
INSERT INTO messages ... ;
APPLY BATCH;
RTFM: Cassandra Batches Edition
16
"Batches are atomic by default. In the context of a Cassandra batch
operation, atomic means that if any of the batch succeeds, all of it will."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
"Although an atomic batch guarantees that if any part of the batch succeeds,
all of it will, no other transactional enforcement is done at the batch level.
For example, there is no batch isolation. Clients are able to read the first
updated rows from the batch, while other rows are still being updated on the
server."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
Atomic? That's kind of a loaded word.
Multiple Partition Batch Failure Scenario
17
Journal
RF = 3
Multiple Partition Batch Failure Scenario
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
17
Journal
BEGIN BATCH
...
APPLY BATCH;
Batch
Log
Batch
Log
Batch
Log
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
• Batch has been
partially applied
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
Multiple Partition Batch Failure Scenario
• Once written to the
Batch Log successfully,
we know all the writes
in the batch will
succeed eventually
(atomic?)
• Batch has been
partially applied
• Possible to read a
partially applied batch
since there is no batch
isolation
17
Journal
BEGIN BATCH
...
APPLY BATCH;
CL = QUORUM
RF = 3
WriteTimeout
- writeType = BATCH
RTFM: Cassandra Batches Edition Part 2
24
"For example, there is no batch isolation. Clients are able to read the first
updated rows from the batch, while other rows are still being updated on the
server. However, transactional row updates within a partition key are
isolated: clients cannot read a partial update."
- DataStax CQL Docs
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html
What we really need is Isolation.
When writing a group of messages, ensure that
we write the group to a single partition.
Logic Changes to Ensure Batch Isolation
• Still use configurable Partition Size
– not a "hard limit" but a "best attempt"
• On write, see if messages will all fit in the
current partition
• If not, roll over to the next partition early
• Reading is slightly more complicated
– For a given sequence number it might be in
partition n or (n+1)
25
seq_nr = 97
seq_nr = 98
seq_nr = 1
99
100
101
partition_nr = 1
partition_nr = 2
PartitionSize=100
Accounting for Deletes
26
Option 1: Mark Individual Messages as Deleted
• Add an is_deleted column
to our messages table
• Read all messages between
two sequence numbers
27
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
is_deleted bool,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number message is_deleted
... 1 0x00 true
... 2 0x00 true
... 3 0x00 false
... 4 0x00 false
Option 1: Mark Individual Messages as Deleted
• Pros:
– On replay, easy to check if a
message has been deleted (comes
included in message query's data)
• Cons:
– Messages not immutable any
more
– Issue lots of UPDATEs to mark
each message as deleted
– Have to scan through a lot of rows
to find max deleted sequence
number if we want to avoid
issuing unnecessary UPDATEs
28
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
message blob,
is_deleted bool,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number)
);
Option 2: Write a Marker Row for Each Deleted Row
• Add a marker column and
make it a clustering column
– Messages written with 'A'
– Deletes get written with 'D'
• Read all messages between
two sequence numbers
29
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number marker message
... 1 A 0x00
... 1 D null
... 2 A 0x00
... 3 A 0x00
Option 2: Write a Marker Row for Each Deleted Row
• Pros
– On replay, easy to peek at next
row to check if deleted (comes
included in message query's data)
– Message data stays immutable
• Cons
– Issue lots of INSERTs to mark
each message as deleted
– Have to scan through a lot of rows
to find max deleted sequence
number if we want to avoid
issuing unnecessary INSERTs
– Potentially twice as many rows to
store
30
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
Looking at Physical Deletes
• Physically delete messages to a
given sequence number
• Still probably want to scan
through rows to see what's
already been deleted first
31
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'A'
AND sequence_number = ?;
...
APPLY BATCH;
• Can't range delete, so we have
to do lots of individual
DELETEs
Looking at Physical Deletes
• Read all messages between
two sequence numbers
• With how DELETEs work in
Cassandra, is there a potential
problem with this query?
32
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
sequence_number bigint,
marker text,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
sequence_number, marker)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
...
message=
0x00...
seq_nr=2
marker='A'
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
...
Delete messages to a sequence number
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = '57ab...'
AND partition_nr = 1
AND marker = 'A'
AND sequence_nr = 1;
...
APPLY BATCH;
message=
0x00...
seq_nr=2
marker='A'
Tombstone Hell: Queue-like Data Sets
33
Journal persistence_id
'57ab...'
partition_nr
1
message=
0x00...
seq_nr=1
marker='A'
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
Delete messages to a sequence number
BEGIN BATCH
DELETE FROM messages
WHERE persistence_id = '57ab...'
AND partition_nr = 1
AND marker = 'A'
AND sequence_nr = 1;
...
APPLY BATCH;
message=
0x00...
seq_nr=2
marker='A'
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
Tombstone Hell: Queue-like Data Sets
• At some point compaction runs and we
don't have two versions any more, but
tombstones don't go away immediately
– Tombstones remain for gc_grace_seconds
– Default is 10 days
33
Journal persistence_id
'57ab...'
partition_nr
1
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
Tombstone Hell: Queue-like Data Sets
37
Journal persistence_id
'57ab...'
partition_nr
1
seq_nr=1
marker='A'
Tombstone
NO DATA HERE
...
Read all messages between 2 sequence numbers
SELECT * FROM messages
WHERE persistence_id = '57ab...'
AND partition_number = 1
AND sequence_number >= 1
AND sequence_number <= [max value];
seq_nr=2
marker='A'
Tombstone
NO DATA HERE
seq_nr=3
marker='A'
Tombstone
NO DATA HERE
seq_nr=4
marker='A'
Tombstone
NO DATA HERE
Avoid Tombstone Hell
38
We need a way to avoid reading
tombstones when replaying messages.
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND sequence_number >= ?
AND sequence_number <= ?;
AND sequence_number >= ?
If we know what sequence number we've already deleted to
before we query, we could make that lower bound smarter.
A Third Option for Deletes
• Use marker as a clustering
column, but change the
clustering order
– Messages still 'A', Deletes 'D'
• Read all messages between
two sequence numbers
39
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
SELECT * FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'A'
AND sequence_number >= ?
AND sequence_number <= ?;
(repeat until we reach sequence number or run out of partitions)
... sequence_number marker message
... 1 A 0x00
... 2 A 0x00
... 3 A 0x00
A Third Option for Deletes
• Messages data no longer has
deleted information, so how do we
know what's already been deleted?
• Get max deleted sequence number
• Can avoid tombstones if done
before getting message data
40
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
SELECT sequence_number FROM messages
WHERE persistence_id = ?
AND partition_number = ?
AND marker = 'D'
ORDER BY marker DESC,
sequence_number DESC
LIMIT 1;
A Third Option for Deletes
• Pros
– Message data stays immutable
– Issue a single INSERT when
deleting to a sequence number
– Read a single row to find out
what's been deleted (no more
scanning)
– Can avoid reading tombstones
created by physical deletes
• Cons
– Requires a separate query to find
out what's been deleted before
getting message data
41
CREATE TABLE messages (
persistence_id text,
partition_number bigint,
marker text,
sequence_number bigint,
message blob,
PRIMARY KEY (
(persistence_id, partition_number),
marker, sequence_number)
);
Lessons Learned
42
Summary
• Seemingly simple data models can
get a lot more complicated
• Avoid unbounded partition growth
– Add data to your partition key
• Be aware of how Cassandra Logged Batches work
– If you need isolation, only write to a single partition
• Avoid queue-like data sets and be aware of how tombstones might
impact your queries
– Try to query with ranges that avoid tombstones
43
Questions?
@LukeTillman
https://www.linkedin.com/in/luketillman/
https://github.com/LukeTillman/
44

More Related Content

What's hot

Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
Altinity Ltd
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
Altinity Ltd
 
Extending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured InformationExtending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured Information
Antonio Vallecillo
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
Altinity Ltd
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Databricks
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
Altinity Ltd
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
PostgreSQL-Consulting
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
botsplash.com
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
Databricks
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
Altinity Ltd
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
Altinity Ltd
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
Federico Campoli
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
PgDay.Seoul
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks
 

What's hot (20)

Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
Extending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured InformationExtending Complex Event Processing to Graph-structured Information
Extending Complex Event Processing to Graph-structured Information
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
 
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEO
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
All About JSON and ClickHouse - Tips, Tricks and New Features-2022-07-26-FINA...
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 

Viewers also liked

Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Luke Tillman
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandraAxel Liljencrantz
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
Luke Tillman
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
Luke Tillman
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Luke Tillman
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
Luke Tillman
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Luke Tillman
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
Luke Tillman
 

Viewers also liked (10)

Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and CassandraAvoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
Avoiding the Pit of Despair - Event Sourcing with Akka and Cassandra
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra summit 2013 how not to use cassandra
Cassandra summit 2013  how not to use cassandraCassandra summit 2013  how not to use cassandra
Cassandra summit 2013 how not to use cassandra
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Getting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for CassandraGetting started with DataStax .NET Driver for Cassandra
Getting started with DataStax .NET Driver for Cassandra
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
Relational Scaling and the Temple of Gloom (from Cassandra Summit 2015)
 
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
From Monolith to Microservices with Cassandra, gRPC, and Falcor (from Cassand...
 

Similar to Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)

Lecture 2 coal sping12
Lecture 2 coal sping12Lecture 2 coal sping12
Lecture 2 coal sping12Rabia Khalid
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
Ligaya Turmelle
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
Michael Keane
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
Redis Labs
 
Week 6 java script loops
Week 6   java script loopsWeek 6   java script loops
Week 6 java script loopsbrianjihoonlee
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
MariaDB plc
 
RICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the mazeRICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the maze
palvaro
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questions
YashJain47002
 
SQL Server 2012 Best Practices
SQL Server 2012 Best PracticesSQL Server 2012 Best Practices
SQL Server 2012 Best Practices
Microsoft TechNet - Belgium and Luxembourg
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
Tomasz Kowal
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
Jeff Patti
 
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerQueuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Niels Berglund
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript TipsLotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Bill Buchan
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
VAIBHAVKADAGANCHI
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data Modeling
ScyllaDB
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012
Connor McDonald
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
MariaDB plc
 

Similar to Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016) (20)

Lecture 2 coal sping12
Lecture 2 coal sping12Lecture 2 coal sping12
Lecture 2 coal sping12
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in RedisRedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
RedisConf18 - Fail-Safe Starvation-Free Durable Priority Queues in Redis
 
Week 6 java script loops
Week 6   java script loopsWeek 6   java script loops
Week 6 java script loops
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 
RICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the mazeRICON keynote: outwards from the middle of the maze
RICON keynote: outwards from the middle of the maze
 
Lec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questionsLec2_cont.pptx galgotias University questions
Lec2_cont.pptx galgotias University questions
 
SQL Server 2012 Best Practices
SQL Server 2012 Best PracticesSQL Server 2012 Best Practices
SQL Server 2012 Best Practices
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
 
Amazon Redshift
Amazon RedshiftAmazon Redshift
Amazon Redshift
 
Queuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL ServerQueuing Sql Server: Utilise queues to increase performance in SQL Server
Queuing Sql Server: Utilise queues to increase performance in SQL Server
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript TipsLotusphere 2007 AD505 DevBlast 30 LotusScript Tips
Lotusphere 2007 AD505 DevBlast 30 LotusScript Tips
 
Introduction to C ++.pptx
Introduction to C ++.pptxIntroduction to C ++.pptx
Introduction to C ++.pptx
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data Modeling
 
How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012How to tune a query - ODTUG 2012
How to tune a query - ODTUG 2012
 
Data race
Data raceData race
Data race
 
What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0What's new in MariaDB TX 3.0
What's new in MariaDB TX 3.0
 

Recently uploaded

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
Peter Caitens
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 

Recently uploaded (20)

Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 

Event Sourcing with Cassandra (from Cassandra Japan Meetup in Tokyo March 2016)

  • 1. Event Sourcing with Cassandra Luke Tillman Technical Evangelist @LukeTillman
  • 2. • Evangelist with a focus on Developers – Long-time Developer on RDBMS (lots of .NET) • I still write a lot of code, but now I also do a lot of teaching and speaking Who are you? 2
  • 3. A Quick Recap of Event Sourcing 3
  • 4. Persistence with Event Sourcing • Instead of keeping the current state, keep a journal of all the deltas (events) • Append only (no UPDATE or DELETE) • We can replay our journal of events to get the current state 4 Shopping Cart (id = 1345) user_id= 4762 created_on= 7/10/2015… Cart Created item_id= 7621 quantity= 1 price= 19.99 Item Added item_id= 9134 quantity= 2 price= 16.99 Item Added Item Removed item_id= 7621 Qty Changed item_id= 9134 quantity= 1
  • 5. Event Sourcing in Practice • Typically two kinds of storage: – Event Journal Store – Snapshot Store • A history of how we got to the current state can be useful • We've also got a lot more data to store than we did before 5 Shopping Cart (id = 1345) user_id= 4762 created_on= 7/10/2015… Cart Created item_id= 7621 quantity= 1 price= 19.99 Item Added item_id= 9134 quantity= 2 price= 16.99 Item Added Item Removed item_id= 7621 Qty Changed item_id= 9134 quantity= 1
  • 6. Why use Cassandra for Event Sourcing? • Transactional (OLTP) Workload • Sequentially written, immutable data – Looks a lot like time series data • Easy to scale out to capture more events 6
  • 7. Event Sourcing Example: Akka Persistence 7
  • 8. Akka Persistence Journal API Summary • Write Method – For a given actor, write a group of messages • Delete Method – For a given actor, permanently or logically delete all messages up to a given sequence number • Read Methods – For a given actor, read back all the messages between two sequence numbers – For a given actor, read the highest sequence number that's been written 8
  • 9. An Event Journal in Cassandra Data Modeling for Reads and Writes 9
  • 10. A Simple First Attempt • Use persistence_id as partition key – all messages for a given persistence Id together • Use sequence_number as clustering column – order messages by sequence number inside a partition • Read all messages between two sequence numbers • Read the highest sequence number 10 CREATE TABLE messages ( persistence_id text, sequence_number bigint, message blob, PRIMARY KEY ( persistence_id, sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND sequence_number >= ? AND sequence_number <= ?; SELECT sequence_number FROM messages WHERE persistence_id = ? ORDER BY sequence_number DESC LIMIT 1;
  • 11. A Simple First Attempt • Write a group of messages • Use a Cassandra Batch statement to ensure all messages (success) or no messages (failure) get written • What's the problem with this data model (ignoring implementing deletes for now)? 11 CREATE TABLE messages ( persistence_id text, sequence_number bigint, message blob, PRIMARY KEY ( persistence_id, sequence_number) ); BEGIN BATCH INSERT INTO messages ... ; INSERT INTO messages ... ; INSERT INTO messages ... ; APPLY BATCH;
  • 12. Unbounded Partition Growth • Cassandra has a hard limit of 2 billion cells in a partition • But there's also a practical limit – Depends on row/cell data size, but likely not more than millions of rows 12 Journal INSERT INTO messages ... persistence_id= '57ab...' seq_nr= 1 seq_nr= 2 message= 0x00... message= 0x00... ∞?
  • 13. Fixing the Unbounded Partition Growth Problem • General strategy: add a column to the partition key – Compound partition key • Can be data that's already part of the model, or a "synthetic" column • Allow users to configure a partition size in the plugin – Partition Size = number of rows per partition – This should not be changeable once messages have been written • Partition number for a given sequence number is then easy to calculate – (seqNr – 1) / partitionSize (100 – 1) / 100 = partition 0 (101 – 1) / 100 = partition 1 13 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) );
  • 14. Fixing the Unbounded Partition Growth Problem • Read all messages between two sequence numbers • Read the highest sequence number 14 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; SELECT sequence_number FROM messages WHERE persistence_id = ? AND partition_number = ? ORDER BY sequence_number DESC LIMIT 1; (repeat until we reach sequence number or run out of partitions) (repeat until we run out of partitions)
  • 15. Fixing the Unbounded Partition Growth Problem • Write a group of messages • A Cassandra Batch statement might now write to multiple partitions (if the sequence numbers cross a partition boundary) • Is that a problem? 15 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); BEGIN BATCH INSERT INTO messages ... ; INSERT INTO messages ... ; INSERT INTO messages ... ; APPLY BATCH;
  • 16. RTFM: Cassandra Batches Edition 16 "Batches are atomic by default. In the context of a Cassandra batch operation, atomic means that if any of the batch succeeds, all of it will." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html "Although an atomic batch guarantees that if any part of the batch succeeds, all of it will, no other transactional enforcement is done at the batch level. For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html Atomic? That's kind of a loaded word.
  • 17. Multiple Partition Batch Failure Scenario 17 Journal RF = 3
  • 18. Multiple Partition Batch Failure Scenario 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 19. Multiple Partition Batch Failure Scenario 17 Journal BEGIN BATCH ... APPLY BATCH; Batch Log Batch Log Batch Log CL = QUORUM RF = 3
  • 20. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 21. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 22. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) • Batch has been partially applied 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3
  • 23. Multiple Partition Batch Failure Scenario • Once written to the Batch Log successfully, we know all the writes in the batch will succeed eventually (atomic?) • Batch has been partially applied • Possible to read a partially applied batch since there is no batch isolation 17 Journal BEGIN BATCH ... APPLY BATCH; CL = QUORUM RF = 3 WriteTimeout - writeType = BATCH
  • 24. RTFM: Cassandra Batches Edition Part 2 24 "For example, there is no batch isolation. Clients are able to read the first updated rows from the batch, while other rows are still being updated on the server. However, transactional row updates within a partition key are isolated: clients cannot read a partial update." - DataStax CQL Docs http://docs.datastax.com/en/cql/3.1/cql/cql_reference/batch_r.html What we really need is Isolation. When writing a group of messages, ensure that we write the group to a single partition.
  • 25. Logic Changes to Ensure Batch Isolation • Still use configurable Partition Size – not a "hard limit" but a "best attempt" • On write, see if messages will all fit in the current partition • If not, roll over to the next partition early • Reading is slightly more complicated – For a given sequence number it might be in partition n or (n+1) 25 seq_nr = 97 seq_nr = 98 seq_nr = 1 99 100 101 partition_nr = 1 partition_nr = 2 PartitionSize=100
  • 27. Option 1: Mark Individual Messages as Deleted • Add an is_deleted column to our messages table • Read all messages between two sequence numbers 27 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, is_deleted bool, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number message is_deleted ... 1 0x00 true ... 2 0x00 true ... 3 0x00 false ... 4 0x00 false
  • 28. Option 1: Mark Individual Messages as Deleted • Pros: – On replay, easy to check if a message has been deleted (comes included in message query's data) • Cons: – Messages not immutable any more – Issue lots of UPDATEs to mark each message as deleted – Have to scan through a lot of rows to find max deleted sequence number if we want to avoid issuing unnecessary UPDATEs 28 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, message blob, is_deleted bool, PRIMARY KEY ( (persistence_id, partition_number), sequence_number) );
  • 29. Option 2: Write a Marker Row for Each Deleted Row • Add a marker column and make it a clustering column – Messages written with 'A' – Deletes get written with 'D' • Read all messages between two sequence numbers 29 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number marker message ... 1 A 0x00 ... 1 D null ... 2 A 0x00 ... 3 A 0x00
  • 30. Option 2: Write a Marker Row for Each Deleted Row • Pros – On replay, easy to peek at next row to check if deleted (comes included in message query's data) – Message data stays immutable • Cons – Issue lots of INSERTs to mark each message as deleted – Have to scan through a lot of rows to find max deleted sequence number if we want to avoid issuing unnecessary INSERTs – Potentially twice as many rows to store 30 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) );
  • 31. Looking at Physical Deletes • Physically delete messages to a given sequence number • Still probably want to scan through rows to see what's already been deleted first 31 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); BEGIN BATCH DELETE FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'A' AND sequence_number = ?; ... APPLY BATCH; • Can't range delete, so we have to do lots of individual DELETEs
  • 32. Looking at Physical Deletes • Read all messages between two sequence numbers • With how DELETEs work in Cassandra, is there a potential problem with this query? 32 CREATE TABLE messages ( persistence_id text, partition_number bigint, sequence_number bigint, marker text, message blob, PRIMARY KEY ( (persistence_id, partition_number), sequence_number, marker) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions)
  • 33. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' ... message= 0x00... seq_nr=2 marker='A'
  • 34. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' ... Delete messages to a sequence number BEGIN BATCH DELETE FROM messages WHERE persistence_id = '57ab...' AND partition_nr = 1 AND marker = 'A' AND sequence_nr = 1; ... APPLY BATCH; message= 0x00... seq_nr=2 marker='A'
  • 35. Tombstone Hell: Queue-like Data Sets 33 Journal persistence_id '57ab...' partition_nr 1 message= 0x00... seq_nr=1 marker='A' seq_nr=1 marker='A' Tombstone NO DATA HERE ... Delete messages to a sequence number BEGIN BATCH DELETE FROM messages WHERE persistence_id = '57ab...' AND partition_nr = 1 AND marker = 'A' AND sequence_nr = 1; ... APPLY BATCH; message= 0x00... seq_nr=2 marker='A' seq_nr=2 marker='A' Tombstone NO DATA HERE
  • 36. Tombstone Hell: Queue-like Data Sets • At some point compaction runs and we don't have two versions any more, but tombstones don't go away immediately – Tombstones remain for gc_grace_seconds – Default is 10 days 33 Journal persistence_id '57ab...' partition_nr 1 seq_nr=1 marker='A' Tombstone NO DATA HERE ... seq_nr=2 marker='A' Tombstone NO DATA HERE
  • 37. Tombstone Hell: Queue-like Data Sets 37 Journal persistence_id '57ab...' partition_nr 1 seq_nr=1 marker='A' Tombstone NO DATA HERE ... Read all messages between 2 sequence numbers SELECT * FROM messages WHERE persistence_id = '57ab...' AND partition_number = 1 AND sequence_number >= 1 AND sequence_number <= [max value]; seq_nr=2 marker='A' Tombstone NO DATA HERE seq_nr=3 marker='A' Tombstone NO DATA HERE seq_nr=4 marker='A' Tombstone NO DATA HERE
  • 38. Avoid Tombstone Hell 38 We need a way to avoid reading tombstones when replaying messages. SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND sequence_number >= ? AND sequence_number <= ?; AND sequence_number >= ? If we know what sequence number we've already deleted to before we query, we could make that lower bound smarter.
  • 39. A Third Option for Deletes • Use marker as a clustering column, but change the clustering order – Messages still 'A', Deletes 'D' • Read all messages between two sequence numbers 39 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) ); SELECT * FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'A' AND sequence_number >= ? AND sequence_number <= ?; (repeat until we reach sequence number or run out of partitions) ... sequence_number marker message ... 1 A 0x00 ... 2 A 0x00 ... 3 A 0x00
  • 40. A Third Option for Deletes • Messages data no longer has deleted information, so how do we know what's already been deleted? • Get max deleted sequence number • Can avoid tombstones if done before getting message data 40 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) ); SELECT sequence_number FROM messages WHERE persistence_id = ? AND partition_number = ? AND marker = 'D' ORDER BY marker DESC, sequence_number DESC LIMIT 1;
  • 41. A Third Option for Deletes • Pros – Message data stays immutable – Issue a single INSERT when deleting to a sequence number – Read a single row to find out what's been deleted (no more scanning) – Can avoid reading tombstones created by physical deletes • Cons – Requires a separate query to find out what's been deleted before getting message data 41 CREATE TABLE messages ( persistence_id text, partition_number bigint, marker text, sequence_number bigint, message blob, PRIMARY KEY ( (persistence_id, partition_number), marker, sequence_number) );
  • 43. Summary • Seemingly simple data models can get a lot more complicated • Avoid unbounded partition growth – Add data to your partition key • Be aware of how Cassandra Logged Batches work – If you need isolation, only write to a single partition • Avoid queue-like data sets and be aware of how tombstones might impact your queries – Try to query with ranges that avoid tombstones 43