1. Š2013 DataStax ConďŹdential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinâ¨
Chief Evangelist, DataStax
Advanced Cassandra
1
6. Cassandra is notâŚ
6
A Data Ocean or Pond., Lake
An In-Memory Database
A Key-Value Store
A magical database unicorn that farts rainbows
7. 7
When to useâŚ
Loose data model (joins, sub-selects)
Absolute consistency (aka gotta have ACID)
No need to use anything else
Youâll miss the long, candle lit dinners with your Oracle rep
that always end with âwhatâs your budget look like this
year?â
Oracle, MySQL, Postgres or <RDBMS>
8. Uptime is a top priority
Unpredictable or high scaling requirements
Workload is transactional
Willing to put the time or effort into understanding how Cassandra works
and how to use it.
8
When to useâŚ
Use Oracle when you want to count your money.
Use Cassandra when you want to make money.
Cassandra
9. Copy n Paste your relational model
APACHE
CASSANDRA
26. User Auth
Step 1 Turn it on
cassandra.yaml
authorizer:PasswordAuthorizerAllowAllAuthorizer
authenticator:AllowAllAuthenticatorPasswordAuthenticator
27. User Auth
cqlsh -u cassandra -p cassandra
Step 2 Create users
cqlsh> create user dude with password 'manager' superuser;
cqlsh> create user worker with password 'newhire';
cqlsh> list users;
name | super
----------+-------
cassandra | True
worker | False
dude | True
28. User Auth
cqlsh -u cassandra -p cassandra
Step 3 Grant permissions
cqlsh> create user ro_user with password '1234567';
cqlsh> grant all on killrvideo.user to dude;
cqlsh> grant select on killrvideo.user to ro_user;
31. How they work: Prepare
SELECT * FROM user WHERE id = ?
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Prepare
Parsed
Hashed Cached
Prepared Statement
32. How they work: Bind
id = 1 + PreparedStatement Hash
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
Bind & Execute
Combine
Pre-parsed Query and
Variable
Execute
34. How to Prepare(Statements)
PreparedStatement userSelect = session.prepare(âSELECT * FROM user WHERE id = ?â);
BoundStatement userSelectStatement = new BoundStatement(userSelect);
session.execute(userSelectStatement.bind(1));
prepared_stmt = session.prepare (âSELECT * FROM user WHERE id = ?â)
bound_stmt = prepared_stmt.bind([1])
session.execute(bound_stmt)
Java
Python
35. Donât do this
for (int i = 1; i < 100; i++) {
PreparedStatement userSelect = session.prepare(âSELECT * FROM user WHERE id = ?â);
BoundStatement userSelectStatement = new BoundStatement(userSelect);
session.execute(userSelectStatement.bind(1));
}
36. Execute vs Execute Async
⢠Very subtle difference
⢠Blocking vs non-blocking call
VS
42. Batch (Logged)
⢠All statements collected on client
⢠Sent in one shot
⢠All done on 1 node
Batch is accepted
All actions are logged on
two replicas
Statements executed in
sequence
Results are collected and
returned
43. Batches: The good
⢠Great for denormalized inserts/updates
// Looking from the video side to many users
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
// looking from the user side to many videos
CREATE TABLE comments_by_user (
userid uuid,
commentid timeuuid,
videoid uuid,
comment text,
PRIMARY KEY (userid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
44. Batches: The good
⢠Both inserts are run
⢠On failure, the batch log will replay
BEGIN BATCH
INSERT INTO comments_by_video (videoid, userid, commentid, comment)
VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.')
INSERT INTO comments_by_video (videoid, userid, commentid, comment)
VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.')
APPLY BATCH;
45. Batches: The bad
âI was doing a load test and nodes started blinking offlineâ
âWere you using a batch by any chance?â
âWhy yes I was! How did you know?â
âHow big was each batch?â
â1000 inserts eachâ
46. Batches: The bad
BEGIN BATCH
1000 inserts
APPLY BATCH;
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
Client
47. Batches: The rules
⢠Keep them small and for atomicity
CASSANDRA-6487 - Warn on large batches (5Kb default)
CASSANDRA-8011 - Fail on large batches (50Kb default)
49. Old Row cache: The problem
⢠Reads an entire storage row of data
ID = 1
Partition Key
(Storage Row Key)
2014-09-08 12:00:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:01:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:02:00 :
name
SFO
2014-09-08 12:00:00 :
temp
64.0
Need this
Caches this
50. New Row Cache: The solution
⢠Stores just a few CQL rows
ID = 1
Partition Key
(Storage Row Key)
2014-09-08 12:00:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:01:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:02:00 :
name
SFO
2014-09-08 12:00:00 :
temp
64.0
Need this
Caches this
51. Using row cache
CREATE TABLE user_search_history_with_cache (
id int,
search_time timestamp,
search_text text,
search_results int,
PRIMARY KEY (id, search_time)
) WITH CLUSTERING ORDER BY (search_time DESC)
AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '20' };