24. May the node
be with you!
Robert Stupp Solutions Architect @ DataStax
robert.stupp@datastax.com Committer to Apache Cassandra
@snazy
Editor's Notes
Frankly, "the cloud" started with... the iPhone
Think the "cloud way"
Nothing worse than customers not reaching your service –> lose moneyUsers’ apps are always on – so should your database
Answers must come really quick - https://www.nngroup.com/articles/website-response-times/ 0.1 seconds gives the feeling of instantaneous response1 second keeps the user's flow of thought seamless10 seconds keeps the user's attention
Amount of different devicesNetflix example
Network latency – bring the data to the users - http://www.verizonenterprise.com/about/network/latency/45ms within US 30ms within Europe 90ms London – New York 160ms Trans Pacific 250ms Europe – AsiaCANNOT BEAT THAT – IT’S BARE PHYSICS
Add more nodes for more transactions , more data
EC means:
There is a time gap between the first write until the data is available on all replicas
MENTION: Replication to other data center
QUORUM means MAJORITY
MAJORITY of 3 is 2
AFTER:
Mention LOCAL_ONE, LOCAL_QUORUM, TWO, THREE, ALL, EACH_QUORUM
1. DSE write path
2. DSE node
3. Memtable
4. Commit Log
5. Application want to write some data
6. ... goes to the memtable
7. ... written to CL (node restart)
---------
8. (hint: SSTables)
9. much data written over time --> memtable grows
10. memtable flushed to SSTable
11. more sstables (ANIMATED!)
Take some similar sized SSTables and compact them to one, bigger one
That’s STCS
STCS – size tiered compaction strategy
Default
Multiple, similar sized SSTables compacted to one
LCS – leveled compaction strategy
Many writes to same partitions
Works fine with SSDs
DTCS – date tiered compaction strategy
Time series data
TTL’d data
never overwritten
Old SSTables can just be dropped
MENTION: Keyspace, Table
Partition Key: determines the replica nodes
Clustering Key: identifies the CQL row in the partition
MENTION: Size restrictions
1) “Logical” means:
Entities
Relations between entities
2) When you know what you ask for, you know:
your queries
the workflow of your application
the data you really need
3) Combine conceptual model and queries
Declare tables and their keys
Add additional views to tables
4) Depending on the workload
Add bucketing (split partitions logically)
Choose the “right” compaction strategy
Consider TTLs
Guide through some standard use cases
Partition key not included
--> does not know the nodes to ask
Needs two writes:
- to "customers" table
- to "customers_by_email" table
( THE NAIVE, RELATIONAL WAY )
- query all addresses
- query address by user and type
Access by partition key --> fine
That’s relational
That’s client side join
EXPLAIN : UDTs
EXPLAIN : collections
MENTION : frozen
Just ONE read - not TWO reads as before
Registration pre-check - THE NAIVE WAY
CLASSICAL RACE CONDITION
LWT
MENTION:
Expensive
Paxos
Pre-checks w/ read not necessary
RECALL: the customers table
RECALL: the customers_by_email table