CQRS and Event
Sourcing Applications
with Cassandra_
Matthias Niehoff
#CassandraSummit 2015
1
! The Use Case
! Event Sourcing
! CQRS
! Cassandra for Storage
! Spark for Processing
! Benefits & Pitfalls
! Q&A
Agenda_
2
The Use Case
3
24x7 Proxy_
4
LegacySystems

(Not24x7)
“InternetReady“
Applications
(24x7available)
24x7 Proxy
•Caches data
•Provides data
•Stores changes
•Provides changes
•No business logic/validation
•Solution needs to be highly scalable 

(up to 100.000 reads/s, 10.000 writes/s)
•Read and write access needs to be low latency
•Read/write ratio is 10:1 or higher
•Solution needs to deal with up to 500.000.000
customers
Assumptions_
5
Event Sourcing
6
Traditional Pattern: Saving Application State_
7
Store
ID
Address
Article
Name
StockSize updateInventory()
getInventory()
sells
A series of sales and replenishments for
• a tablet
• Starting with 60, sell 20, replenish 10
• a stove
• Starting with 25, sell 5, no replenishments
What is different with Event Sourcing?_
8
Saving only application state
What is the Difference?_
9
:ArticleInventory
Fancy Tablet
50
:ArticleInventory
Gas Stove
20
Saving events instead of state
What is the Difference?_
10
:ArticleInventory
Fancy Tablet
39
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
:ArticleInventory
Fancy Tablet
45
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
:ArticleInventory
Fancy Tablet
50
15-08-14T19:..
:ArticleInventory
Gas Stove
20
15-08-14T19:..
•Log of all stock changes
•Complete rebuild of the state
•Temporal query
•Event replay and rollback
Benefits of Storing Events_
11
CQRS
12
Default Application Architecture_
13
UserInterface
DomainModel
ApplicationServices
DB
CQRS Application Architecture_
14
UserInterface
Query
Services
Command
Services
DomainModel
DB
•The pattern is simple
•Going further
• Split up the domain model
• Independent scaling of models
• Not using a query model at all
• Different databases for models
A Pattern Changing Your Mindset_
15
Event Sourcing & CQRS_
16
Command
Services
Command
Model
ReadLayer
Query
Services
Query
Services
Query
Services Asynchronous
DB
Event Store
Query
Stores
ProcessorEvent
Processor
DB
DB
DB
Storage with Cassandra
17
•Not only an event sink
• Compaction
• Selective replay
•No single point of failure
•Horizontal scale & Geo Replication
•Write ahead of unmodified data
•Plays well with further processing
•Open source & a huge community
•Easy operations
Why Cassandra…
18
For accessing all entities of a given type
Event Store_
19
CREATE TABLE event_source_by_type (
entity_type TEXT,
bucket INT,
entity_key TEXT,
insert_time TIMESTAMP,
update_time TIMESTAMP,
payload TEXT,
PRIMARY KEY((entity_type,bucket),insert_time,entity_key)
) 

WITH CLUSTERING ORDER BY (created_at DESC,entity_key ASC);
e.g. as JSON, XML, protobuf, Avro
prevent huge partitions
CREATE TABLE event_source_by_key (
entity_type TEXT,
entity_key TEXT,
insert_time TIMESTAMP,
update_time TIMESTAMP,
payload TEXT,
PRIMARY KEY((entity_type,entity_key),created_at)
) 

WITH CLUSTERING ORDER BY (created_at DESC);
For accessing an entity directly
Optional: Second Table_
20
e.g. as JSON, XML or protobuf
•Create tables that fit your queries!
•E.g. „Get articles in category ‚computer‘“
Query Stores_
21
CREATE TABLE articles_by_category (
category TEXT PRIMARY KEY,
article_id UUID,
article_info TEXT
);
may need bucketing
could also be a
JSON document
Query Stores_
22
„I need ad-hoc queries“
„I need specific queries with
a lot of different filters“
Query Stores_
23
Processing with Spark
24
•Command model triggers event processor
•Event processor updates query views
From Event Store to Query Store_
25
Command
Model
Event
Processor DB
DB
DB
Event
Processor
Event
Processor
Event Processing in Detail_
26
Command
Model DB
DB
DB
•Easy scale out
•Easy deployment
•Intuitive Scala & Java API
•Fault tolerant
•Out-of-the-box Kafka adapter
•Integrates well with Cassandra
Why Spark?
27
•Spark Streaming application
•Consumes only topics of interest
•Joins the stream of events with the current view
• Use primary key of entity for correlation
• Use joinWithCassandraTable
Spark Job in Detail_
28
1. Create a table for the query view
2. Create a Spark job filling your table
3. Deploy the Spark job
4. Init reprocess of the event DB
• same transformation logic as in normal processing
• source can be different
5. Mark view as initialized
If you need a new query view_
29
Query
DB
Event
DB
Benefits &
Pitfalls
30
•Scalability
• On storage & processing: just add nodes
• Efficient queries due to separation
•Collaboration
• Every client gets its own data access
• Easy to support new queries
Benefits_
31
•More complexity than simple CRUD
•Side effects on event replay
•Eventual consistency in query views
•Concurrent writes
•Performance of replay
Pitfalls_
32
Lost Updates
•Due to parallel processing
• Two events A and B as sequential input
• A is processed after B
•Solution
• Partition Spark RDD by entity key
• Use a lambda architecture
Pitfalls_
33
speed
Data
Stream
Serving
Layer
batch
•Event Store Compaction
• Compact store to improve processing time
• Only store latest entry of a entity key
• e.g. a Spark batch job / Cassandra TTL
•Snapshot / Master State
• Constantly build a complete state of all data
• Can be used
• To speed up initialization
• As a store for a search engine
Pitfalls_
34
The Use Case
Solved with ES & CQRS
35
24x7 Proxy
24x7 Proxy_
36
LegacyCoreSystems

(Not24x7)
“InternetReady“Applications
(24x7available)
37
Questions?
Thank You!
Matthias Niehoff,
IT-Consultant
codecentric AG
Zeppelinstraße 2
76185 Karlsruhe, Germany
www.codecentric.de
blog.codecentric.de
matthiasniehoff
38

codecentric AG: CQRS and Event Sourcing Applications with Cassandra

  • 1.
    CQRS and Event SourcingApplications with Cassandra_ Matthias Niehoff #CassandraSummit 2015 1
  • 2.
    ! The UseCase ! Event Sourcing ! CQRS ! Cassandra for Storage ! Spark for Processing ! Benefits & Pitfalls ! Q&A Agenda_ 2
  • 3.
  • 4.
    24x7 Proxy_ 4 LegacySystems
 (Not24x7) “InternetReady“ Applications (24x7available) 24x7 Proxy •Cachesdata •Provides data •Stores changes •Provides changes •No business logic/validation
  • 5.
    •Solution needs tobe highly scalable 
 (up to 100.000 reads/s, 10.000 writes/s) •Read and write access needs to be low latency •Read/write ratio is 10:1 or higher •Solution needs to deal with up to 500.000.000 customers Assumptions_ 5
  • 6.
  • 7.
    Traditional Pattern: SavingApplication State_ 7 Store ID Address Article Name StockSize updateInventory() getInventory() sells
  • 8.
    A series ofsales and replenishments for • a tablet • Starting with 60, sell 20, replenish 10 • a stove • Starting with 25, sell 5, no replenishments What is different with Event Sourcing?_ 8
  • 9.
    Saving only applicationstate What is the Difference?_ 9 :ArticleInventory Fancy Tablet 50 :ArticleInventory Gas Stove 20
  • 10.
    Saving events insteadof state What is the Difference?_ 10 :ArticleInventory Fancy Tablet 39 15-08-14T19:.. :ArticleInventory Gas Stove 20 15-08-14T19:.. :ArticleInventory Fancy Tablet 45 15-08-14T19:.. :ArticleInventory Gas Stove 20 15-08-14T19:.. :ArticleInventory Fancy Tablet 50 15-08-14T19:.. :ArticleInventory Gas Stove 20 15-08-14T19:..
  • 11.
    •Log of allstock changes •Complete rebuild of the state •Temporal query •Event replay and rollback Benefits of Storing Events_ 11
  • 12.
  • 13.
  • 14.
  • 15.
    •The pattern issimple •Going further • Split up the domain model • Independent scaling of models • Not using a query model at all • Different databases for models A Pattern Changing Your Mindset_ 15
  • 16.
    Event Sourcing &CQRS_ 16 Command Services Command Model ReadLayer Query Services Query Services Query Services Asynchronous DB Event Store Query Stores ProcessorEvent Processor DB DB DB
  • 17.
  • 18.
    •Not only anevent sink • Compaction • Selective replay •No single point of failure •Horizontal scale & Geo Replication •Write ahead of unmodified data •Plays well with further processing •Open source & a huge community •Easy operations Why Cassandra… 18
  • 19.
    For accessing allentities of a given type Event Store_ 19 CREATE TABLE event_source_by_type ( entity_type TEXT, bucket INT, entity_key TEXT, insert_time TIMESTAMP, update_time TIMESTAMP, payload TEXT, PRIMARY KEY((entity_type,bucket),insert_time,entity_key) ) 
 WITH CLUSTERING ORDER BY (created_at DESC,entity_key ASC); e.g. as JSON, XML, protobuf, Avro prevent huge partitions
  • 20.
    CREATE TABLE event_source_by_key( entity_type TEXT, entity_key TEXT, insert_time TIMESTAMP, update_time TIMESTAMP, payload TEXT, PRIMARY KEY((entity_type,entity_key),created_at) ) 
 WITH CLUSTERING ORDER BY (created_at DESC); For accessing an entity directly Optional: Second Table_ 20 e.g. as JSON, XML or protobuf
  • 21.
    •Create tables thatfit your queries! •E.g. „Get articles in category ‚computer‘“ Query Stores_ 21 CREATE TABLE articles_by_category ( category TEXT PRIMARY KEY, article_id UUID, article_info TEXT ); may need bucketing could also be a JSON document
  • 22.
    Query Stores_ 22 „I needad-hoc queries“ „I need specific queries with a lot of different filters“
  • 23.
  • 24.
  • 25.
    •Command model triggersevent processor •Event processor updates query views From Event Store to Query Store_ 25 Command Model Event Processor DB DB DB Event Processor Event Processor
  • 26.
    Event Processing inDetail_ 26 Command Model DB DB DB
  • 27.
    •Easy scale out •Easydeployment •Intuitive Scala & Java API •Fault tolerant •Out-of-the-box Kafka adapter •Integrates well with Cassandra Why Spark? 27
  • 28.
    •Spark Streaming application •Consumesonly topics of interest •Joins the stream of events with the current view • Use primary key of entity for correlation • Use joinWithCassandraTable Spark Job in Detail_ 28
  • 29.
    1. Create atable for the query view 2. Create a Spark job filling your table 3. Deploy the Spark job 4. Init reprocess of the event DB • same transformation logic as in normal processing • source can be different 5. Mark view as initialized If you need a new query view_ 29 Query DB Event DB
  • 30.
  • 31.
    •Scalability • On storage& processing: just add nodes • Efficient queries due to separation •Collaboration • Every client gets its own data access • Easy to support new queries Benefits_ 31
  • 32.
    •More complexity thansimple CRUD •Side effects on event replay •Eventual consistency in query views •Concurrent writes •Performance of replay Pitfalls_ 32
  • 33.
    Lost Updates •Due toparallel processing • Two events A and B as sequential input • A is processed after B •Solution • Partition Spark RDD by entity key • Use a lambda architecture Pitfalls_ 33 speed Data Stream Serving Layer batch
  • 34.
    •Event Store Compaction •Compact store to improve processing time • Only store latest entry of a entity key • e.g. a Spark batch job / Cassandra TTL •Snapshot / Master State • Constantly build a complete state of all data • Can be used • To speed up initialization • As a store for a search engine Pitfalls_ 34
  • 35.
    The Use Case Solvedwith ES & CQRS 35
  • 36.
  • 37.
  • 38.
    Thank You! Matthias Niehoff, IT-Consultant codecentricAG Zeppelinstraße 2 76185 Karlsruhe, Germany www.codecentric.de blog.codecentric.de matthiasniehoff 38