Who wants to be a Cassandra Millionaire?
40-minutes of best practice - getting you ready for certification
@VictorFAnjos
Toronto Cassandra Day
2© 2015. All Rights Reserved. @VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
Welcome to
Who Wants to be a
Cassandra
Millionaire
50:50
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
5© 2015. All Rights Reserved.
A: NAS / SAN
C: DAS SATA
B: SSD
D: DAS SCSI
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This storage
medium allows for
best performance.
@VictorFAnjos
6© 2015. All Rights Reserved.
@VictorFAnjos
A: NAS / SAN
C: DAS SATA
B: SSD
D: DAS SCSI
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
This storage
medium allows for
best performance.
Installation and considerations
how to store the datastore
Storage Area Network Solid State Drive
7© 2015. All Rights Reserved.
@VictorFAnjos
Installation and considerations
how to store the datastore
Local (DAS), iSCSI, Fiber Channel
8© 2015. All Rights Reserved.
@VictorFAnjos
● AVOID network storage like the plague
● Direct Attached Storage FTW
● Disk latency is a HUGE deal for performance
Installation and considerations
how to store the datastore
9© 2015. All Rights Reserved. 9
@VictorFAnjos
SATA/SAS DAS
PCIe/NVMe DAS
Installation and considerations
how to store the datastore
10© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
12© 2015. All Rights Reserved.
@VictorFAnjos
A: ZFS
C: Ext4
B: Btrfs
D: F2FS
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
When using SSDs,
this filesystem type
is best.
@VictorFAnjos
13© 2015. All Rights Reserved.
@VictorFAnjos
A: ZFS
C: Ext4
B: Btrfs
D: F2FS
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
When using SSDs,
this filesystem type
is best.
@VictorFAnjos
Congratulations!
You’ve Reached
the 1,000 ops/s
Milestone!
Congratulations!Congratulations!
@VictorFAnjos
Installation and considerations
i can’t believe it’s not btrfs
15© 2015. All Rights Reserved.
@VictorFAnjos
● easiest to use ext4 (it’s on most linux
distros), but F2FS get 5-10% gains in write
performance
● if NOT using F2FS, make sure to TRIM
● multiple disks → use RAID0
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
17© 2015. All Rights Reserved.
@VictorFAnjos
A: 0
C: Equal to HEAP
B: ½ of HEAP
D: EQUAL TO RAM
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This is the
sweetspot for SWAP
when using C*
@VictorFAnjos
18© 2015. All Rights Reserved.
@VictorFAnjos
A: 0
C: Equal to HEAP
B: ½ of HEAP
D: EQUAL TO RAM
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This is the
sweetspot for SWAP
when using C*
@VictorFAnjos
Installation and considerations
to swap or not to swap
19© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
21© 2015. All Rights Reserved.
@VictorFAnjos
A: 64G
C: 16G
B: 32G
D: 8G
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
Having 64G of RAM
means you should
optimize to have
___G of HEAP.
@VictorFAnjos
22© 2015. All Rights Reserved.
@VictorFAnjos
A: 64G
C: 16G
B: 32G
D: 8G
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
Having 64G of RAM
means you should
optimize to have
___G of HEAP.
Installation and considerations
how much heap?
23© 2015. All Rights Reserved.
@VictorFAnjos
http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_tune_jvm_c.html
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
25© 2015. All Rights Reserved.
@VictorFAnjos
A: EC2Snitch
C: Simple Snitch
B: Dynamic Snitch
D: Property File Snitch
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
Definitely DO NOT
use this snitch in
Multi-DC
environments.
@VictorFAnjos
26© 2015. All Rights Reserved.
@VictorFAnjos
A: EC2Snitch
C: Simple Snitch
B: Dynamic Snitch
D: Property File Snitch
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
Definitely DO NOT
use this snitch in
Multi-DC
environments.
Installation and considerations
son of a snitch
27© 2015. All Rights Reserved.
@VictorFAnjos
Installation and considerations
son of a snitch
28© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
30© 2015. All Rights Reserved.
@VictorFAnjos
A: Synchronous AND Full Queries
C: Synchronous AND Prepared Statements
B: Asynchronous AND Prepared Statements
D: Asynchronous AND Full Queries
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
To reduce latency
and wire time to my
app, I should opt
for.
@VictorFAnjos
31© 2015. All Rights Reserved.
@VictorFAnjos
A: Synchronous AND Full Queries
C: Synchronous AND Prepared Statements
B: Asynchronous AND Prepared Statements
D: Asynchronous AND Full Queries
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
To reduce latency
and wire time to my
app, I should opt
for.
@VictorFAnjos
Achieving performance through code/drivers
should I stay or should I go
32© 2015. All Rights Reserved.
@VictorFAnjos
● Client writes to any Cassandra node
● Coordinator node replicates to other nodes
(in local and remote Data Center)
● Local write acks returned to coordinator
● Client gets ack when enough total nodes
are committed
● Data written to internal commit log disks
● When data arrives, remote node replicates
data
MULTI DC
● Ack direct to source region coordinator
● Remote copies written to commit log disks
lf a node or region goes offline, hinted
handoff completes the write when the
node comes back up (as long as there are
enough nodes to satisfy consistency level).
Achieving performance through code/drivers
should I stay or should I go
33© 2015. All Rights Reserved.
@VictorFAnjos
Prepare ONCE...
Bind and Execute multiple times.
Achieving performance through code/drivers
should I stay or should I go
34© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
36© 2015. All Rights Reserved.
@VictorFAnjos
A: 1 / 1 = 1
C: 2 * 1 = 2
B: 2 / 1 = 2
D: 2 / 2 + 1 = 2
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
With RF=2 and
CL=Quorum, operations
failed when 1 node went
down because of this.
@VictorFAnjos
37© 2015. All Rights Reserved.
@VictorFAnjos
A: 1 / 1 = 1
C: 2 * 1 = 2
B: 2 / 1 = 2
D: 2 / 2 + 1 = 2
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
With RF=2 and
CL=Quorum, operations
failed when 1 node went
down because of this.
@VictorFAnjos
Congratulations!
You’ve Reached
the 32,000 ops/s
Milestone!
Congratulations!Congratulations!
@VictorFAnjos
Achieving performance through code/drivers
when friends aren’t enough
39© 2015. All Rights Reserved.
@VictorFAnjos
Replication Factor = 3
Insert into a cluster of size 6 with
consistency Quorum
Two nodes in token range must be
present for write to succeed
Achieving performance through code/drivers
when friends aren’t enough
40© 2015. All Rights Reserved.
@VictorFAnjos
What happens now?
Cannot achieve consistency level QUORUM
Cannot achieve consistency level QUORUM
Cannot achieve consistency level QUORUM
Cannot achieve consistency level QUORUM
Nodes in partition key DOWN
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
42© 2015. All Rights Reserved.
@VictorFAnjos
A: Truth table
C: CAP Theorem
B: Brewer’s Theorem
D: Entropy
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This mathematical and
CS concept helps when
data modeling for query
optimization.
@VictorFAnjos
43© 2015. All Rights Reserved.
@VictorFAnjos
A: Truth table
C: CAP Theorem
B: Brewer’s Theorem
D: Entropy
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
This mathematical and
CS concept helps when
data modeling for query
optimization.
Data modelling, CQLSH and more
the truth shall set you free
44© 2015. All Rights Reserved.
@VictorFAnjos
Motivated by CS, Math, Engineering
Allows for creating building blocks
that yield a single output
More complex truth tables can arise
Data modelling, CQLSH and more
the truth shall set you free
45© 2015. All Rights Reserved.
@VictorFAnjos
How about searching for username?
And what about full_name?
user_stream
← ← ← Partition Key → → →
user_id username full_name
1 0 0
0 1 0
0 0 1
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
47© 2015. All Rights Reserved.
@VictorFAnjos
A: Reads / Batches
C: Writes / Deletes
B: Writes / Batches
D: Reads / Deletes
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
A shift in paradigms,
what should you
maximize and reduce for
best performance.
@VictorFAnjos
48© 2015. All Rights Reserved.
@VictorFAnjos
A: Reads / Batches
C: Writes / Deletes
B: Writes / Batches
D: Reads / Deletes
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
A shift in paradigms,
what should you
maximize and reduce for
good performance.
Data modelling, CQLSH and more
do the write thing
49© 2015. All Rights Reserved.
@VictorFAnjos
Data modelling, CQLSH and more
do the write thing
50© 2015. All Rights Reserved.
@VictorFAnjos
memtable --- < 100ns
commit log --- ~ 1 ms
DELETES / TTL cause compactions
Data modelling, CQLSH and more
do the write thing
51© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
53© 2015. All Rights Reserved.
@VictorFAnjos
A: ACID
C: Rollback
B: Vector
D: Sharding
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
To not hit a 2B record
limit (per row), this
RDBMS borrowed term
can still makes sense.
@VictorFAnjos
54© 2015. All Rights Reserved.
@VictorFAnjos
A: ACID
C: Rollback
B: Vector
D: Sharding
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
To not hit a 2B record
limit (per row), this
RDBMS borrowed term
can still makes sense.
@VictorFAnjos
Data modelling, CQLSH and more
sit on this and rotate
55© 2015. All Rights Reserved.
@VictorFAnjos
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
57© 2015. All Rights Reserved.
@VictorFAnjos
A: Batches
C: Secondary Indexes
B: Synchronous
D: MySQL
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
Many say to use
sparingly, I would say,
avoid like the plague.
@VictorFAnjos
58© 2015. All Rights Reserved.
@VictorFAnjos
A: Batches
C: Secondary Indexes
B: Synchronous
D: MySQL
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
Many say to use
sparingly, I would say,
avoid like the plague.
Performance must-haves
never be second best
59© 2015. All Rights Reserved.
@VictorFAnjos
writes are distributed among the cluster
each partition key refers to one exact
position in which to get a row
but what do we do when we don’t have exactly
the right type of index to specify a query
CREATE TABLE users (
user varchar,
email varchar,
state varchar,
PRIMARY KEY (user));
-- OPTION 1 : create an index
CREATE INDEX idxUBS on users (state);
-- OPTION 2 : create another table (store data twice)
CREATE TABLE usersByState (
state varchar,
user varchar,
PRIMARY KEY (state, user));
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
@VictorFAnjos
61© 2015. All Rights Reserved.
@VictorFAnjos
A: UDT
C: JSON
B: Lightweight Transactions
D: Hinted handoff
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This addition to C* can
help with ACID like
transactions, at a bit of a
performance hit.
@VictorFAnjos
62© 2015. All Rights Reserved.
@VictorFAnjos
A: UDT
C: JSON
B: Lightweight Transactions
D: Hinted handoff
50:50
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
1 Million
500,000
250,000
125,000
64,000
32,000
16,000
8,000
4,000
2,000
1,000
500
300
200
100
This recent addition to
C* now helps with ACID
like transactions, at a bit
of a performance hit.
@VictorFAnjos
Performance must-haves
slimfast agreement
63© 2015. All Rights Reserved.
@VictorFAnjos
Prepares a proposal that is sent to a number of Acceptors.
Waits on a an acknowledgement (in form of promise) from Acceptors.
Sends accept message to Quorum of Acceptors with new value to commit.
Returns success? completion to client.
Determines if proposal is newer than what it has seen.
Acknowledges/agree with its own highest proposal value seen AND the
current value (of what is to be set).
Receive message to commit new value.
Accept and return on successful commit of value.
64© 2015. All Rights Reserved.
@VictorFAnjos
Performance must-haves
slimfast agreement
Thank you

Traveler's Guide to Cassandra

  • 1.
    Who wants tobe a Cassandra Millionaire? 40-minutes of best practice - getting you ready for certification @VictorFAnjos Toronto Cassandra Day
  • 2.
    2© 2015. AllRights Reserved. @VictorFAnjos
  • 3.
  • 4.
  • 5.
    5© 2015. AllRights Reserved. A: NAS / SAN C: DAS SATA B: SSD D: DAS SCSI 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This storage medium allows for best performance. @VictorFAnjos
  • 6.
    6© 2015. AllRights Reserved. @VictorFAnjos A: NAS / SAN C: DAS SATA B: SSD D: DAS SCSI 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos This storage medium allows for best performance.
  • 7.
    Installation and considerations howto store the datastore Storage Area Network Solid State Drive 7© 2015. All Rights Reserved. @VictorFAnjos
  • 8.
    Installation and considerations howto store the datastore Local (DAS), iSCSI, Fiber Channel 8© 2015. All Rights Reserved. @VictorFAnjos ● AVOID network storage like the plague ● Direct Attached Storage FTW ● Disk latency is a HUGE deal for performance
  • 9.
    Installation and considerations howto store the datastore 9© 2015. All Rights Reserved. 9 @VictorFAnjos SATA/SAS DAS PCIe/NVMe DAS
  • 10.
    Installation and considerations howto store the datastore 10© 2015. All Rights Reserved. @VictorFAnjos
  • 11.
  • 12.
    12© 2015. AllRights Reserved. @VictorFAnjos A: ZFS C: Ext4 B: Btrfs D: F2FS 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 When using SSDs, this filesystem type is best. @VictorFAnjos
  • 13.
    13© 2015. AllRights Reserved. @VictorFAnjos A: ZFS C: Ext4 B: Btrfs D: F2FS 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 When using SSDs, this filesystem type is best. @VictorFAnjos
  • 14.
    Congratulations! You’ve Reached the 1,000ops/s Milestone! Congratulations!Congratulations! @VictorFAnjos
  • 15.
    Installation and considerations ican’t believe it’s not btrfs 15© 2015. All Rights Reserved. @VictorFAnjos ● easiest to use ext4 (it’s on most linux distros), but F2FS get 5-10% gains in write performance ● if NOT using F2FS, make sure to TRIM ● multiple disks → use RAID0
  • 16.
  • 17.
    17© 2015. AllRights Reserved. @VictorFAnjos A: 0 C: Equal to HEAP B: ½ of HEAP D: EQUAL TO RAM 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This is the sweetspot for SWAP when using C* @VictorFAnjos
  • 18.
    18© 2015. AllRights Reserved. @VictorFAnjos A: 0 C: Equal to HEAP B: ½ of HEAP D: EQUAL TO RAM 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This is the sweetspot for SWAP when using C* @VictorFAnjos
  • 19.
    Installation and considerations toswap or not to swap 19© 2015. All Rights Reserved. @VictorFAnjos
  • 20.
  • 21.
    21© 2015. AllRights Reserved. @VictorFAnjos A: 64G C: 16G B: 32G D: 8G 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 Having 64G of RAM means you should optimize to have ___G of HEAP. @VictorFAnjos
  • 22.
    22© 2015. AllRights Reserved. @VictorFAnjos A: 64G C: 16G B: 32G D: 8G 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos Having 64G of RAM means you should optimize to have ___G of HEAP.
  • 23.
    Installation and considerations howmuch heap? 23© 2015. All Rights Reserved. @VictorFAnjos http://docs.datastax.com/en/cassandra/1.2/cassandra/operations/ops_tune_jvm_c.html
  • 24.
  • 25.
    25© 2015. AllRights Reserved. @VictorFAnjos A: EC2Snitch C: Simple Snitch B: Dynamic Snitch D: Property File Snitch 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 Definitely DO NOT use this snitch in Multi-DC environments. @VictorFAnjos
  • 26.
    26© 2015. AllRights Reserved. @VictorFAnjos A: EC2Snitch C: Simple Snitch B: Dynamic Snitch D: Property File Snitch 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos Definitely DO NOT use this snitch in Multi-DC environments.
  • 27.
    Installation and considerations sonof a snitch 27© 2015. All Rights Reserved. @VictorFAnjos
  • 28.
    Installation and considerations sonof a snitch 28© 2015. All Rights Reserved. @VictorFAnjos
  • 29.
  • 30.
    30© 2015. AllRights Reserved. @VictorFAnjos A: Synchronous AND Full Queries C: Synchronous AND Prepared Statements B: Asynchronous AND Prepared Statements D: Asynchronous AND Full Queries 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 To reduce latency and wire time to my app, I should opt for. @VictorFAnjos
  • 31.
    31© 2015. AllRights Reserved. @VictorFAnjos A: Synchronous AND Full Queries C: Synchronous AND Prepared Statements B: Asynchronous AND Prepared Statements D: Asynchronous AND Full Queries 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 To reduce latency and wire time to my app, I should opt for. @VictorFAnjos
  • 32.
    Achieving performance throughcode/drivers should I stay or should I go 32© 2015. All Rights Reserved. @VictorFAnjos ● Client writes to any Cassandra node ● Coordinator node replicates to other nodes (in local and remote Data Center) ● Local write acks returned to coordinator ● Client gets ack when enough total nodes are committed ● Data written to internal commit log disks ● When data arrives, remote node replicates data MULTI DC ● Ack direct to source region coordinator ● Remote copies written to commit log disks lf a node or region goes offline, hinted handoff completes the write when the node comes back up (as long as there are enough nodes to satisfy consistency level).
  • 33.
    Achieving performance throughcode/drivers should I stay or should I go 33© 2015. All Rights Reserved. @VictorFAnjos Prepare ONCE... Bind and Execute multiple times.
  • 34.
    Achieving performance throughcode/drivers should I stay or should I go 34© 2015. All Rights Reserved. @VictorFAnjos
  • 35.
  • 36.
    36© 2015. AllRights Reserved. @VictorFAnjos A: 1 / 1 = 1 C: 2 * 1 = 2 B: 2 / 1 = 2 D: 2 / 2 + 1 = 2 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 With RF=2 and CL=Quorum, operations failed when 1 node went down because of this. @VictorFAnjos
  • 37.
    37© 2015. AllRights Reserved. @VictorFAnjos A: 1 / 1 = 1 C: 2 * 1 = 2 B: 2 / 1 = 2 D: 2 / 2 + 1 = 2 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 With RF=2 and CL=Quorum, operations failed when 1 node went down because of this. @VictorFAnjos
  • 38.
    Congratulations! You’ve Reached the 32,000ops/s Milestone! Congratulations!Congratulations! @VictorFAnjos
  • 39.
    Achieving performance throughcode/drivers when friends aren’t enough 39© 2015. All Rights Reserved. @VictorFAnjos Replication Factor = 3 Insert into a cluster of size 6 with consistency Quorum Two nodes in token range must be present for write to succeed
  • 40.
    Achieving performance throughcode/drivers when friends aren’t enough 40© 2015. All Rights Reserved. @VictorFAnjos What happens now? Cannot achieve consistency level QUORUM Cannot achieve consistency level QUORUM Cannot achieve consistency level QUORUM Cannot achieve consistency level QUORUM Nodes in partition key DOWN
  • 41.
  • 42.
    42© 2015. AllRights Reserved. @VictorFAnjos A: Truth table C: CAP Theorem B: Brewer’s Theorem D: Entropy 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This mathematical and CS concept helps when data modeling for query optimization. @VictorFAnjos
  • 43.
    43© 2015. AllRights Reserved. @VictorFAnjos A: Truth table C: CAP Theorem B: Brewer’s Theorem D: Entropy 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos This mathematical and CS concept helps when data modeling for query optimization.
  • 44.
    Data modelling, CQLSHand more the truth shall set you free 44© 2015. All Rights Reserved. @VictorFAnjos Motivated by CS, Math, Engineering Allows for creating building blocks that yield a single output More complex truth tables can arise
  • 45.
    Data modelling, CQLSHand more the truth shall set you free 45© 2015. All Rights Reserved. @VictorFAnjos How about searching for username? And what about full_name? user_stream ← ← ← Partition Key → → → user_id username full_name 1 0 0 0 1 0 0 0 1
  • 46.
  • 47.
    47© 2015. AllRights Reserved. @VictorFAnjos A: Reads / Batches C: Writes / Deletes B: Writes / Batches D: Reads / Deletes 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 A shift in paradigms, what should you maximize and reduce for best performance. @VictorFAnjos
  • 48.
    48© 2015. AllRights Reserved. @VictorFAnjos A: Reads / Batches C: Writes / Deletes B: Writes / Batches D: Reads / Deletes 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos A shift in paradigms, what should you maximize and reduce for good performance.
  • 49.
    Data modelling, CQLSHand more do the write thing 49© 2015. All Rights Reserved. @VictorFAnjos
  • 50.
    Data modelling, CQLSHand more do the write thing 50© 2015. All Rights Reserved. @VictorFAnjos memtable --- < 100ns commit log --- ~ 1 ms DELETES / TTL cause compactions
  • 51.
    Data modelling, CQLSHand more do the write thing 51© 2015. All Rights Reserved. @VictorFAnjos
  • 52.
  • 53.
    53© 2015. AllRights Reserved. @VictorFAnjos A: ACID C: Rollback B: Vector D: Sharding 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 To not hit a 2B record limit (per row), this RDBMS borrowed term can still makes sense. @VictorFAnjos
  • 54.
    54© 2015. AllRights Reserved. @VictorFAnjos A: ACID C: Rollback B: Vector D: Sharding 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 To not hit a 2B record limit (per row), this RDBMS borrowed term can still makes sense. @VictorFAnjos
  • 55.
    Data modelling, CQLSHand more sit on this and rotate 55© 2015. All Rights Reserved. @VictorFAnjos
  • 56.
  • 57.
    57© 2015. AllRights Reserved. @VictorFAnjos A: Batches C: Secondary Indexes B: Synchronous D: MySQL 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 Many say to use sparingly, I would say, avoid like the plague. @VictorFAnjos
  • 58.
    58© 2015. AllRights Reserved. @VictorFAnjos A: Batches C: Secondary Indexes B: Synchronous D: MySQL 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 @VictorFAnjos Many say to use sparingly, I would say, avoid like the plague.
  • 59.
    Performance must-haves never besecond best 59© 2015. All Rights Reserved. @VictorFAnjos writes are distributed among the cluster each partition key refers to one exact position in which to get a row but what do we do when we don’t have exactly the right type of index to specify a query CREATE TABLE users ( user varchar, email varchar, state varchar, PRIMARY KEY (user)); -- OPTION 1 : create an index CREATE INDEX idxUBS on users (state); -- OPTION 2 : create another table (store data twice) CREATE TABLE usersByState ( state varchar, user varchar, PRIMARY KEY (state, user));
  • 60.
  • 61.
    61© 2015. AllRights Reserved. @VictorFAnjos A: UDT C: JSON B: Lightweight Transactions D: Hinted handoff 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This addition to C* can help with ACID like transactions, at a bit of a performance hit. @VictorFAnjos
  • 62.
    62© 2015. AllRights Reserved. @VictorFAnjos A: UDT C: JSON B: Lightweight Transactions D: Hinted handoff 50:50 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 1 Million 500,000 250,000 125,000 64,000 32,000 16,000 8,000 4,000 2,000 1,000 500 300 200 100 This recent addition to C* now helps with ACID like transactions, at a bit of a performance hit. @VictorFAnjos
  • 63.
    Performance must-haves slimfast agreement 63©2015. All Rights Reserved. @VictorFAnjos Prepares a proposal that is sent to a number of Acceptors. Waits on a an acknowledgement (in form of promise) from Acceptors. Sends accept message to Quorum of Acceptors with new value to commit. Returns success? completion to client. Determines if proposal is newer than what it has seen. Acknowledges/agree with its own highest proposal value seen AND the current value (of what is to be set). Receive message to commit new value. Accept and return on successful commit of value.
  • 64.
    64© 2015. AllRights Reserved. @VictorFAnjos Performance must-haves slimfast agreement
  • 65.