We built an application based on the principles of CQRS and Event Sourcing using Cassandra and Spark. During the project we encountered a number of challenges and problems with Cassandra and the Spark Connector.
In this talk we want to outline a few of those problems and our actions to solve them. While some problems are specific to CQRS and Event Sourcing applications most of them are use case independent.
About the Speakers
Matthias Niehoff IT-Consultant, codecentric AG
works as an IT-Consultant at codecentric AG in Germany. His focus is on big data & streaming applications with Apache Cassandra & Apache Spark. Yet he does not lose track of other tools in the area of big data. Matthias shares his experiences on conferences, meetups and usergroups.
Stephan Kepser Senior IT Consultant and Data Architect, codecentric AG
Dr. Stephan Kepser is an expert on cloud computing and big data. He wrote a couple of journal articles and blog posts on subjects of both fields. His interests reach from legal questions to questions of architecture and design of cloud computing and big data systems to technical details of NoSQL databases.
4. ! Primary key defines access way to a table
! It cannot be changed after the table is filled with data.
! Any change of the primary key,
like, e.g, adding a column,
requires a data migration.
Data modeling: Primary key
5. ! Well known:
If you delete a column or whole row,
the data is not really deleted.
Rather a tombstone is created to mark the deletion.
! Much later tombstones are removed during
compactions.
Data modeling: Deletions
6. ! Updates don’t create tombstones,
data is overwritten in situ.
! Exception: maps, lists, sets
! Use frozen<list> instead.
Unexpected Tombstones: Built-in Maps, Lists, Sets
7. ! sstable2json shows details of sstable files in json
format.
! Usage: go to /var/lib/cassandra/data/keyspace/table
! > sstable2json *-Data.db
! You can actually see what’s in the individual rows of the
data files.
! sstabledump in 3.6
Debug tool: sstable2json
8. Example
CREATE TABLE customer_cache.tenant (
name text PRIMARY KEY,
status text
)
select * from tenant ;
name | status
------+--------
ru | ACTIVE
es | ACTIVE
jp | ACTIVE
vn | ACTIVE
pl | ACTIVE
cz | ACTIVE
10. ! Strategy to reduce partition size.
! Actually, general strategy in databases to distribute a
table.
! In Cassandra, each partition of a table can be
distributed over a set of buckets.
! Example: A train departure time table
is bucketed over the hours of the day.
! The bucket becomes part of the partition key.
! It must be easily calculable.
Bucketing
11. ! Bulk read or write operations should not be performed
synchronously.
! It makes no sense to wait for each read in the bulk to
complete individually.
! Instead
! send the requests asynchronously to Cassandra, and
! collect the answers.
Bulk Reads or Writes: Asynchronously
12. Example
Session session = cc.openSession();
PreparedStatement getEntries = session.prepare("SELECT * FROM " + keyspace + „."
+ table +" WHERE tenant=? AND core_system=? AND change_type = ? AND seconds = ?
AND bucket = ? „);
private List<ResultSetFuture> sendQueries(Collection<Object[]> partitionKeys) {
List<ResultSetFuture> futures =
Lists.newArrayListWithExpectedSize(partitionKeys.size());
for (Object[] keys : partitionKeys) {
futures.add(session.executeAsync(getEntries.bind(keys)));
}
return futures;
}
13. Example
private void processAsyncResults(List<ResultSetFuture> futures) {
for (ListenableFuture<ResultSet> future : Futures.inCompletionOrder(futures)) {
ResultSet rs = future.get();
if (rs.getAvailableWithoutFetching() > 0 || rs.one() != null) {
// do your program logic here
}
}
}
15. ! Event log fills over time
! Requires fine grained time based partitions to avoid
huge partition size.
! Spark jobs read latest events.
! Access path: time stamp
! Jobs have to read many partitions.
Cassandra as Streaming Source
16. ! Pull method requires frequent reads of empty
partitions, because no new data arrived.
! Impossible to re-read the whole table:
by far too many partitions to read
(our case: 900,000 per day -> 30 millions for a month)
Unsuitable for λ-architecture
! Eventual consistency
! When can you be sure that you read all events?
— Never
! Might improve with CDC
Cassandra is unsuitable as a time-based event source.
17. ! One keyspace per tenant?
! One (set of) table(s) per tenant?
! Our option: Table per tenant
! Feasible only for limited number of tenants (~1000)
Separating Data of Different Tenants
18. ! Switch on monitoring
! ELK, OpsCenter, self built, ....
! Avoid Log level debug for C* messages
! Drowning in irrelevant messages
! Substantial performance drawback
! Log level info for development, pre-production
! Log level error in production sufficient
Monitoring
19. ! When writing data,
Cassandra never checks if there is enough space left
on disk.
! It keeps writing data till the disk is full.
! This can bring the OS to a halt.
! Cassandra error messages are confusing at this point.
! Thus monitoring disk space is mandatory.
Monitoring: Disk Space
20. ! Cassandra requires a lot of disk space for compaction.
! For SizeTieredCompaction reserve as much disk
space as Cassandra needs to store the data.
! Set-up monitoring on disk space.
! Receive an alert if the data carrying disk partition fills
up to 50%.
! Add nodes to the cluster and rebalance.
Monitoring: Disk Space
24. ! Watch for Spark Locality Level
! aim for process or node local
! avoid any
Spark locality
25. ! spark job does not run on every C* node
! # spark nodes < # cassandra nodes
! # job cores < # cassandra nodes
! spark job cores all on one node
! time for repartition > time saving through locality
Do not use repartitionByCassandraReplica when ...
26. ! one query per partition key
! one query at a time per executor
joinWithCassandraTable
Spark
Cassandra
t t+1 t+2 t+3 t+4 t+5
35. ! cache saves performance by preventing recalculation
! it also helps you with regards to correctness!
Cache! Not only for performance
val changedStream = someDStream.map(e -> someMethod(e)).cache()
changedStream.saveToCassandra("keyspace","table1")
changedStream.saveToCassandra("keyspace","table1")
ChangedEntry someMethod(Entry e) {
return new ChangedEntry(new Date(),...);
}
36. • Know the most important internals
• Know your tools
• Monitor your cluster
• Use existing knowledge resources
• Use the mailing lists
• Participate in the community
Summary