6. @doanduyhai
Last Write Win (LWW)
6
jdoe
age name
33 John DOE
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
#partition
7. @doanduyhai
Last Write Win (LWW)
jdoe
age (t1) name (t1)
33 John DOE
7
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
auto-generated timestamp (μs)
.
8. @doanduyhai
Last Write Win (LWW)
8
UPDATE users SET age = 34 WHERE login = jdoe;
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2
9. @doanduyhai
Last Write Win (LWW)
9
DELETE age FROM users WHERE login = jdoe;
jdoe
age (t3)
ý
tombstone
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
SSTable1 SSTable2 SSTable3
10. @doanduyhai
Last Write Win (LWW)
10
SELECT age FROM users WHERE login = jdoe;
???
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
11. @doanduyhai
Last Write Win (LWW)
11
SELECT age FROM users WHERE login = jdoe;
✓✕✕
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
31. @doanduyhai
CQL null semantics
31
Reading null value means
• value does not exist (has never bean created)
• value deleted (tombstone)
SELECT age FROM users WHERE login = jdoe; à NULL
32. @doanduyhai
CQL null semantics
32
Writing null means
• delete value (creating tombstone)
• even though it does not exist
UPDATE users SET age = NULL WHERE login = jdoe;
34. @doanduyhai
CQL null semantics
34
Seen in production: bound statement
preparedStatement.bind(33, …, null, null, null, …);
null ☞ tombstone creation on each update …
jdoe
age name geo_loc mood status
33 John DOE ý ý ý
37. @doanduyhai
Intensive update on same column
37
Data model
sensor_id
value
45.0034
CREATE TABLE sensor_data (
sensor_id long,
value double,
PRIMARY KEY(sensor_id));
38. @doanduyhai
Intensive update on same column
38
Updates
sensor_id
value (t1)
45.0034
UPDATE sensor_data SET value = 45.0034 WHERE sensor_id = …;
UPDATE sensor_data SET value = 47.4182 WHERE sensor_id = …;
UPDATE sensor_data SET value = 48.0300 WHERE sensor_id = …;
sensor_id
value (t13)
47.4182
sensor_id
value (t36)
48.0300
39. @doanduyhai
Intensive update on same column
39
Read
SELECT sensor_value from sensor_data WHERE sensor_id = …;
read N physical columns, only 1 useful … (until compaction)
sensor_id
value (t1)
45.0034
sensor_id
value (t13)
47.4182
sensor_id
value (t36)
48.0300
41. @doanduyhai
Intensive update on same column
41
Solution 1: leveled compaction! (if your I/O can keep up)
sensor_id
value (t1)
45.0034
sensor_id
value (t13)
47.4182
sensor_id
value (t36)
48.0300
sensor_id
value (t36)
48.0300
42. @doanduyhai
Intensive update on same column
42
Solution 2: reversed timeseries & DateTiered compaction strategy
CREATE TABLE sensor_data (
sensor_id long,
date timestamp,
value double,
PRIMARY KEY((sensor_id), date))
WITH CLUSTERING ORDER (date DESC);
43. @doanduyhai
Intensive update on same column
43
Data cleaning by configuration the strategy (base_time_seconds)
SELECT sensor_value FROM sensor_data WHERE sensor_id = … LIMIT 1;
sensor_id
date3(t3) date2(t2) date1(t1) ...
48.0300 47.4182 45.0034 …
44. @doanduyhai
Design around dynamic schema
44
Customer emergency call
• 3 nodes cluster almost full
• impossible to scale out
• 4th node in JOINING state for 1 week
• disk space is filling up, production at risk!
45. @doanduyhai
Design around dynamic schema
45
After investigation
• 4th node in JOINING state because streaming is stalled
• NPE in logs
46. @doanduyhai
Design around dynamic schema
46
After investigation
• 4th node in JOINING state because streaming is stalled
• NPE in logs
Cassandra source-code to the rescue
47. @doanduyhai
Design around dynamic schema
47
public class CompressedStreamReader extends StreamReader
{
…
@Override
public SSTableWriter read(ReadableByteChannel channel) throws IOException
{
…
Pair<String, String> kscf = Schema.instance.getCF(cfId);
ColumnFamilyStore cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right);
NPE here
48. @doanduyhai
Design around dynamic schema
48
The truth is
• the devs dynamically drop & recreate table every day
• dynamic schema is in the core of their design
Example:
DROP TABLE catalog_127_20140613;
CREATE TABLE catalog_127_20140614( … );
52. @doanduyhai
Design around dynamic schema
52
Nutshell
• dynamic schema change as normal prod operation is not
recommended
• schema AND topology change at the same time is an anti-pattern
66. @doanduyhai
Cassandra Time To Live
66
Time to live
• built-in feature
• insert data with a TTL in sec
• expires server-side automatically
• ☞ use as sliding-window
67. @doanduyhai
Rate limiting in action
67
Implementation
• threshold = max 3 reset password per sliding 24h per
user
68. @doanduyhai
Rate limiting in action
68
Implementation
• when /password/reset called
• check threshold
• reached ☞ error message/ignore
• not reached ☞ log the attempt with TTL = 86400
70. @doanduyhai
Anti Fraud
70
Real story
• many special offers available
• 30 mins international calls (50 countries)
• unlimited land-line calls to 5 countries
• …
72. @doanduyhai
Anti Fraud
72
Cassandra TTL
• when granting new offer
INSERT INTO user_special_offer(login, offer_code, …)
VALUES(‘jdoe’, ’30_mins_international’,…)
IF NOT EXISTS
USING TTL <offer_duration>;
75. @doanduyhai
Account Validation
75
How to ?
• create account with 10 days TTL
INSERT INTO users(login, name, age)
VALUES(‘jdoe’, ‘John DOE’, 33)
USING TTL 864000;
76. @doanduyhai
Account Validation
76
How to ?
• create random token for validation with 10 days TTL
INSERT INTO account_validation(token, login, name, age)
VALUES(‘A0F83E63DB935465CE73DFE…’, ‘jdoe’, ‘John DOE’, 33)
USING TTL 864000;
77. @doanduyhai
Account Validation
77
On token validation
• check token exist & retrieve user details
SELECT login, name, age FROM account_validation
WHERE token = ‘A0F83E63DB935465CE73DFE…’;
• re-insert durably user details without TTL
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);