SlideShare a Scribd company logo
1 of 64
Download to read offline
©2013 DataStax Confidential. Do not distribute without consent.
@chbatey
Christopher Batey

Technical Evangelist for Apache Cassandra
Avoiding anti-patterns: Staying in love with
Cassandra
@chbatey
Who am I?
• Technical Evangelist for Apache Cassandra
•Work on Stubbed Cassandra
•Help out Apache Cassandra users
• Built systems using Java/Spring/Dropwizard
with Cassandra @ Sky
• Follow me on twitter @chbatey
@chbatey
Anti patterns
• Client side joins
• Multi partition queries
• Batches
• Mutable data
• Retry policies
• Tombstones
• Secondary indices
• Includes home work + prize
@chbatey
Distributed joins
@chbatey
Cassandra can not join or aggregate
Client
Where do I go for the max?
@chbatey
Storing customer events
• Customer event
- customer_id - ChrisBatey
- staff_id - Charlie
- event_type - login, logout, add_to_basket, remove_from_basket
- time
• Store
- name
- store_type - Website, PhoneApp, Phone, Retail
- location
@chbatey
Model
CREATE TABLE customer_events(
customer_id text,
staff_id text,
time timeuuid,
event_type text,
store_name text,
PRIMARY KEY ((customer_id), time));
CREATE TABLE store(
store_name text,
location text,
store_type text,
PRIMARY KEY (store_name));
@chbatey
Reading this
client
RF3
C
Read latest customer event
CL = QUORUM
Read store information
C
@chbatey
One to one vs one to many
• This required possibly 6 out of 8 of our nodes to be up
• Imagine if there were multiple follow up queries
@chbatey
Multi-partition queries
• Client side joins normally end up falling into the more
general anti-pattern of multi partition queries
@chbatey
Adding staff link
CREATE TABLE customer_events(
customer_id text,
staff set<text>,
time timeuuid,
event_type text,
store_name text,
PRIMARY KEY ((customer_id), time));
CREATE TYPE staff(
name text,
favourite_colour text,
job_title text);
CREATE TYPE store(
store_name text,
location text,
store_type text);
@chbatey
Queries
select * from customer_events where customer_id = ‘chbatey’
limit 1
select * from staff where name in (staff1, staff2, staff3,
staff4)
Any one of these fails the whole query fails
@chbatey
Use a multi node cluster locally
@chbatey
Use a multi node cluster locally
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID
Rack
UN 127.0.0.1 102.27 KB 256 ? 15ad7694-3e76-4b74-aea0-fa3c0fa59532
rack1
UN 127.0.0.2 102.18 KB 256 ? cca7d0bb-e884-49f9-b098-e38fbe895cbc
rack1
UN 127.0.0.3 93.16 KB 256 ? 1f9737d3-c1b8-4df1-be4c-d3b1cced8e30
rack1
UN 127.0.0.4 102.1 KB 256 ? fe27b958-5d3a-4f78-9880-76cb7c9bead1
rack1
UN 127.0.0.5 93.18 KB 256 ? 66eb3f23-8889-44d6-a9e7-ecdd57ed61d0
rack1
UN 127.0.0.6 102.12 KB 256 ? e2e99a7b-c1fb-4f2a-9e4f-7a4666f8245e
rack1
@chbatey
Let’s see with with a 6 node cluster
INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'chbatey', 'red',
'Technical Evangelist' );
INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'luket', 'red',
'Technical Evangelist' );
INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'jonh', 'blue',
'Technical Evangelist' );
select * from staff where name in ('chbatey', 'luket', 'jonh');
@chbatey
Trace with CL ONE = 4 nodes used
Execute CQL3 query | 2015-02-02 06:39:58.759000 | 127.0.0.1 | 0
Parsing select * from staff where name in ('chbatey', 'luket', 'jonh'); [SharedPool-Worker-1] | 2015-02-02
06:39:58.766000 | 127.0.0.1 | 7553
Preparing statement [SharedPool-Worker-1] | 2015-02-02 06:39:58.768000 | 127.0.0.1 | 9249
Executing single-partition query on staff [SharedPool-Worker-3] | 2015-02-02 06:39:58.773000 | 127.0.0.1 | 14255
Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2015-02-02 06:39:58.773001 | 127.0.0.1 | 14756
Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2015-02-02 06:39:58.773001 | 127.0.0.1 | 14928
Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2015-02-02 06:39:58.774000 | 127.0.0.1 | 16035
Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.777000 | 127.0.0.5 | 1156
Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.777001 | 127.0.0.5 | 1681
Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.778000 | 127.0.0.5 | 1944
Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.778000 | 127.0.0.3 | 1554
Processing response from /127.0.0.5 [SharedPool-Worker-3] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 20762
Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.779000 | 127.0.0.3 | 2425
Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 21198
Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.779000 | 127.0.0.3 | 2639
Sending message to /127.0.0.6 [WRITE-/127.0.0.6] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 21208
Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.780000 | 127.0.0.5 | 304
Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.780001 | 127.0.0.5 | 574
Executing single-partition query on staff [SharedPool-Worker-2] | 2015-02-02 06:39:58.781000 | 127.0.0.3 | 4075
Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.781000 | 127.0.0.5 | 708
Enqueuing response to /127.0.0.1 [SharedPool-Worker-2] | 2015-02-02 06:39:58.781001 | 127.0.0.3 | 4348
Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.782000 | 127.0.0.3 | 5371
Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.783000 | 127.0.0.6 | 2463
Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.784000 | 127.0.0.6 | 2905
Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.784001 | 127.0.0.6 | 3160
Processing response from /127.0.0.6 [SharedPool-Worker-2] | 2015-02-02 06:39:58.785000 | 127.0.0.1 | --
Request complete | 2015-02-02 06:39:58.782995 | 127.0.0.1 | 23995
@chbatey
Denormalise with UDTs
CREATE TYPE store (name text, type text, postcode text)
CREATE TYPE staff (name text, fav_colour text, job_title text)


CREATE TABLE customer_events(
customer_id text,
time timeuuid,
event_type text,
store store,
staff set<staff>,
PRIMARY KEY ((customer_id), time));
User defined types
@chbatey
Less obvious example
• A good pattern for time series: Bucketing
@chbatey
Adding a time bucket
CREATE TABLE customer_events_bucket(
customer_id text,
time_bucket text
time timeuuid,
event_type text,
store store,
staff set<staff>,
PRIMARY KEY ((customer_id, time_bucket), time));
@chbatey
Queries
select * from customer_events_bucket where customer_id =
‘chbatey’ and time_bucket IN (‘2015-01-01|0910:01’,
‘2015-01-01|0910:02’, ‘2015-01-01|0910:03’, ‘2015-01-01|
0910:04’)
Often better as multiple async queries
@chbatey
Tips for avoiding joins & multi gets
• Say no to client side joins by denormalising
- Much easier with UDTs
• When bucketing aim for at most two buckets for a query
• Get in the habit of reading trace + using a multi node
cluster locally
@chbatey
Unlogged Batches
@chbatey
Unlogged Batches
• Unlogged batches
- Send many statements as one
@chbatey
Customer events table
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
@chbatey
Inserting events
INSERT INTO events.customer_events (customer_id, time , event_type , statff_id , store_type ) VALUES
( ?, ?, ?, ?, ?)
public void storeEvent(ConsistencyLevel consistencyLevel, CustomerEvent customerEvent) {

BoundStatement boundInsert = insertStatement.bind(
customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());



boundInsert.setConsistencyLevel(consistencyLevel);

session.execute(boundInsert);

}
@chbatey
Batch insert - Good idea?
public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {

BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.UNLOGGED);



for (CustomerEvent event : events) {
BoundStatement boundInsert = insertStatement.bind(
customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());



batchStatement.add(boundInsert);
}
session.execute(batchStatement);
}
@chbatey
Not so much
client
C
Very likely to fail if nodes are
down / over loaded
Gives far too much work for the coordinator
Ruins performance gains of
token aware driver
@chbatey
Individual queries
client
@chbatey
Unlogged batch use case
public void storeEvents(String customerId, ConsistencyLevel consistencyLevel, CustomerEvent...
events) {

BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.UNLOGGED);

batchStatement.enableTracing();



for (CustomerEvent event : events) {

BoundStatement boundInsert = insertStatement.bind(

customerId,

event.getTime(),

event.getEventType(),

event.getStaffId(),

event.getStaffId());

boundInsert.enableTracing();

boundInsert.setConsistencyLevel(consistencyLevel);

batchStatement.add(boundInsert);

}



ResultSet execute = session.execute(batchStatement);

logTraceInfo(execute.getExecutionInfo());

}
Partition Key!!
@chbatey
Writing this
client
C
All for the same partition :)
@chbatey
Logged Batches
@chbatey
Logged Batches
• Once accepted the statements will eventually succeed
• Achieved by writing them to a distributed batchlog
• 30% slower than unlogged batches
@chbatey
Batch insert - Good idea?
public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {

BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.LOGGED);



for (CustomerEvent event : events) {
BoundStatement boundInsert = insertStatement.bind(
customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());



batchStatement.add(boundInsert);
}
session.execute(batchStatement);
}
@chbatey
Not so much
client
C BATCH LOG
BL-R
BL-R
BL-R: Batch log replica
@chbatey
Use case?
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (staff_id, time))
@chbatey
Storing events to both tables in a batch
public void storeEventLogged(ConsistencyLevel consistencyLevel, CustomerEvent customerEvent) {

BoundStatement boundInsertForCustomerId = insertByCustomerId.bind(customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());

BoundStatement boundInsertForStaffId = insertByStaffId.bind(customerEvent.getCustomerId(),
customerEvent.getTime(),
customerEvent.getEventType(),
customerEvent.getStaffId(),
customerEvent.getStaffId());


BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.LOGGED);

batchStatement.enableTracing();

batchStatement.setConsistencyLevel(consistencyLevel);

batchStatement.add(boundInsertForCustomerId);

batchStatement.add(boundInsertForStaffId);





ResultSet execute = session.execute(batchStatement);

}
@chbatey
Mutable data
@chbatey
Distributed mutable state :-(
create TABLE accounts(customer text PRIMARY KEY, balance_in_pence int);
Overly naive example
• Cassandra uses last write wins (no Vector clocks)
• Read before write is an anti-pattern - race conditions and latency
@chbatey
Take a page from Event sourcing
create table accounts_log(
customer text,
time timeuuid,
delta int,
primary KEY (customer, time)); All records for the same customer in the same storage row
Transactions ordered by TIMEUUID - UUIDs are your friends in distributed systems
@chbatey
Tips
• Avoid mutable state in distributed system, favour
immutable log
• Can roll up and snapshot to avoid calculation getting too
big
• If you really have to then checkout LWTs and do via CAS
- this will be slower
@chbatey
Understanding failures
(crazy retry policies)
@chbatey
Write timeout
Application C
R1
R2
R3
C=QUROUM
Replication factor: 3
timeout
timeout
Write timeout
@chbatey
Retrying writes
• Cassandra does not roll back!
• R3 has the data and the coordinator has hinted for R1
and R2
@chbatey
Write timeout
• Received acknowledgements
• Required acknowledgements
• Consistency level
• CAS and Batches are more complicated:
- WriteType: SIMPLE, BATCH, UNLOGGED_BATCH,
BATCH_LOG, CAS
@chbatey
Batches
• BATCH_LOG
- Timed out waiting for batch log replicas
• BATCH
- Written to batch log but timed out waiting for actual replica
- Will eventually be committed
• UNLOGGED_BATCH
@chbatey
Idempotent writes
• All writes are idempotent with the following exceptions:
- Counters
- lists
@chbatey
Cassandra as a Queue (tombstones)
@chbatey
Well documented anti-pattern
• http://www.datastax.com/dev/blog/cassandra-anti-
patterns-queues-and-queue-like-datasets
• http://lostechies.com/ryansvihla/2014/10/20/domain-
modeling-around-deletes-or-using-cassandra-as-a-
queue-even-when-you-know-better/
@chbatey
Requirements
• Produce data in one DC
• Consume it once and delete from another DC
• Use Cassandra?
@chbatey
Datamodel
CREATE TABLE queues (
name text,
enqueued_at timeuuid,
payload blob,
PRIMARY KEY (name, enqueued_at)
);
SELECT enqueued_at, payload
FROM queues
WHERE name = 'queue-1'
LIMIT 1;
@chbatey
Trace
activity | source | elapsed
-------------------------------------------+-----------+--------
execute_cql3_query | 127.0.0.3 | 0
Parsing statement | 127.0.0.3 | 48
Peparing statement | 127.0.0.3 | 362
Message received from /127.0.0.3 | 127.0.0.1 | 42
Sending message to /127.0.0.1 | 127.0.0.3 | 718
Executing single-partition query on queues | 127.0.0.1 | 145
Acquiring sstable references | 127.0.0.1 | 158
Merging memtable contents | 127.0.0.1 | 189
Merging data from memtables and 0 sstables | 127.0.0.1 | 235
Read 1 live and 19998 tombstoned cells | 127.0.0.1 | 251102
Enqueuing response to /127.0.0.3 | 127.0.0.1 | 252976
Sending message to /127.0.0.3 | 127.0.0.1 | 253052
Message received from /127.0.0.1 | 127.0.0.3 | 324314
Not good :(
@chbatey
Tips
• Lots of deletes + range queries in the same partition
• Avoid it with data modelling / query pattern:
- Move partition
- Add a start for your range: e.g. enqueued_at >
9d1cb818-9d7a-11b6-96ba-60c5470cbf0e
@chbatey
Secondary Indices
@chbatey
Customer events table
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
staff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
create INDEX on customer_events (staff_id) ;
@chbatey
Indexes to the rescue?
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
@chbatey
Inserts
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'rusty', 42730bf0-c183-11e4-a4c6-1971740a12cd, 'SELL', 'trevor', 'WEB');
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'rusty', 3f07cd70-c183-11e4-a4c6-1971740a12cd, 'BUY', 'trevor', 'WEB');
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'chbatey', a9550c70-c182-11e4-a4c6-1971740a12cd, 'BUY', 'trevor', 'WEB');
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'chbatey', aebbf3e0-c182-11e4-a4c6-1971740a12cd, 'SELL', 'trevor', 'WEB');
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'chbatey', b8c2f050-c182-11e4-a4c6-1971740a12cd, 'VIEW', 'bill', 'WEB');
INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES
( 'chbatey', bcb5ae50-c182-11e4-a4c6-1971740a12cd, 'BUY', 'bill', 'WEB');
@chbatey
Secondary index are local
• The staff_id partition in the secondary index is not
distributed like a normal table
• The secondary index entries are only stored on the node
that contains the customer_id partition
@chbatey
Indexes to the rescue?
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
staff_id customer_id
bill rusty
bill rusty
trevor rusty
A B
chbatey rusty
customer_id time staff_id
chbatey 2015-03-03 08:52:45 trevor
chbatey 2015-03-03 08:52:54 trevor
chbatey 2015-03-03 08:53:11 bill
chbatey 2015-03-03 08:53:18 bill
rusty 2015-03-03 08:56:57 bill
rusty 2015-03-03 08:57:02 bill
rusty 2015-03-03 08:57:20 trevor
customer_events table
staff_id customer_id
trevor chbatey
trevor chbatey
bill chbatey
bill chbatey
bill rusty
bill rusty
trevor rusty
staff_id index
@chbatey
Homework
1. Use a cluster with 6 nodes (CCM)
2. Create the customer events table and add the secondary index
3. Insert the data (insert statements are in a hidden slide I’ll
distribute)
4. Turn tracing on and execute the following queries:
A. select * from customer_events where staff_id = 'trevor' ;
B. select * from customer_events where staff_id = 'trevor' and
customer_id = ‘chbatey’
5. How many partitions queried and nodes were used for each
query?
6. Send me the answer on twitter, first couple get SWAG!
@chbatey
Do it your self index
CREATE TABLE if NOT EXISTS customer_events (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (customer_id, time))
CREATE TABLE if NOT EXISTS customer_events_by_staff (
customer_id text,
statff_id text,
store_type text,
time timeuuid ,
event_type text,
PRIMARY KEY (staff_id, time))
@chbatey
Possibly in 3.0 - Global indexes
@chbatey
Anti patterns summary
• Most anti patterns are very obvious in trace output
• Don’t test on small clusters, most problems only occur at
scale
@chbatey
Thanks for listening
• Check out my blog: http://christopher-
batey.blogspot.co.uk/
- Answers for all questions
- Detailed posts on most of the anti patterns + full working code
• More questions? Contact me on twitter: @chbatey
• Learn more about read / write path, CCM checkout
DataStax academy
- https://academy.datastax.com/
Cassandra Day London 2015: Apache Cassandra Anti-Patterns

More Related Content

Similar to Cassandra Day London 2015: Apache Cassandra Anti-Patterns

What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
DataStax
 

Similar to Cassandra Day London 2015: Apache Cassandra Anti-Patterns (20)

Azure from scratch part 4
Azure from scratch part 4Azure from scratch part 4
Azure from scratch part 4
 
Introduction to Vitess on Kubernetes for MySQL - Webinar
Introduction to Vitess on Kubernetes for MySQL -  WebinarIntroduction to Vitess on Kubernetes for MySQL -  Webinar
Introduction to Vitess on Kubernetes for MySQL - Webinar
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
MySQL 8.0.19 - New Features Summary
MySQL 8.0.19 - New Features SummaryMySQL 8.0.19 - New Features Summary
MySQL 8.0.19 - New Features Summary
 
AWS Cost Control
AWS Cost ControlAWS Cost Control
AWS Cost Control
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
 
Deploying to Oracle SOA Suite 12c - Everything You Need To Know
Deploying to Oracle SOA Suite 12c - Everything You Need To KnowDeploying to Oracle SOA Suite 12c - Everything You Need To Know
Deploying to Oracle SOA Suite 12c - Everything You Need To Know
 
Building a SaaS based product in Azure - Challenges and decisions made
Building a SaaS based product in Azure - Challenges and decisions madeBuilding a SaaS based product in Azure - Challenges and decisions made
Building a SaaS based product in Azure - Challenges and decisions made
 
MySQL 8.0.22 - New Features Summary
MySQL 8.0.22 - New Features SummaryMySQL 8.0.22 - New Features Summary
MySQL 8.0.22 - New Features Summary
 
Secret Techniques to Manage Apache Cloudstack with ActOnCloud
Secret Techniques to Manage Apache Cloudstack with ActOnCloudSecret Techniques to Manage Apache Cloudstack with ActOnCloud
Secret Techniques to Manage Apache Cloudstack with ActOnCloud
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
 
Paris Day Cassandra: Use case
Paris Day Cassandra: Use caseParis Day Cassandra: Use case
Paris Day Cassandra: Use case
 
Migrating Customers to Microsoft Azure: Lessons Learned From the Field
Migrating Customers to Microsoft Azure: Lessons Learned From the FieldMigrating Customers to Microsoft Azure: Lessons Learned From the Field
Migrating Customers to Microsoft Azure: Lessons Learned From the Field
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
 
Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
You got a couple Microservices, now what? - Adding SRE to DevOps
You got a couple Microservices, now what?  - Adding SRE to DevOpsYou got a couple Microservices, now what?  - Adding SRE to DevOps
You got a couple Microservices, now what? - Adding SRE to DevOps
 
SQL Server 2017 Community Driven Features
SQL Server 2017 Community Driven FeaturesSQL Server 2017 Community Driven Features
SQL Server 2017 Community Driven Features
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
MySQL 8.0.17 - New Features Summary
MySQL 8.0.17 - New Features SummaryMySQL 8.0.17 - New Features Summary
MySQL 8.0.17 - New Features Summary
 

More from DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Cassandra Day London 2015: Apache Cassandra Anti-Patterns

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @chbatey Christopher Batey
 Technical Evangelist for Apache Cassandra Avoiding anti-patterns: Staying in love with Cassandra
  • 2. @chbatey Who am I? • Technical Evangelist for Apache Cassandra •Work on Stubbed Cassandra •Help out Apache Cassandra users • Built systems using Java/Spring/Dropwizard with Cassandra @ Sky • Follow me on twitter @chbatey
  • 3. @chbatey Anti patterns • Client side joins • Multi partition queries • Batches • Mutable data • Retry policies • Tombstones • Secondary indices • Includes home work + prize
  • 5. @chbatey Cassandra can not join or aggregate Client Where do I go for the max?
  • 6. @chbatey Storing customer events • Customer event - customer_id - ChrisBatey - staff_id - Charlie - event_type - login, logout, add_to_basket, remove_from_basket - time • Store - name - store_type - Website, PhoneApp, Phone, Retail - location
  • 7. @chbatey Model CREATE TABLE customer_events( customer_id text, staff_id text, time timeuuid, event_type text, store_name text, PRIMARY KEY ((customer_id), time)); CREATE TABLE store( store_name text, location text, store_type text, PRIMARY KEY (store_name));
  • 8. @chbatey Reading this client RF3 C Read latest customer event CL = QUORUM Read store information C
  • 9. @chbatey One to one vs one to many • This required possibly 6 out of 8 of our nodes to be up • Imagine if there were multiple follow up queries
  • 10. @chbatey Multi-partition queries • Client side joins normally end up falling into the more general anti-pattern of multi partition queries
  • 11. @chbatey Adding staff link CREATE TABLE customer_events( customer_id text, staff set<text>, time timeuuid, event_type text, store_name text, PRIMARY KEY ((customer_id), time)); CREATE TYPE staff( name text, favourite_colour text, job_title text); CREATE TYPE store( store_name text, location text, store_type text);
  • 12. @chbatey Queries select * from customer_events where customer_id = ‘chbatey’ limit 1 select * from staff where name in (staff1, staff2, staff3, staff4) Any one of these fails the whole query fails
  • 13. @chbatey Use a multi node cluster locally
  • 14. @chbatey Use a multi node cluster locally Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 127.0.0.1 102.27 KB 256 ? 15ad7694-3e76-4b74-aea0-fa3c0fa59532 rack1 UN 127.0.0.2 102.18 KB 256 ? cca7d0bb-e884-49f9-b098-e38fbe895cbc rack1 UN 127.0.0.3 93.16 KB 256 ? 1f9737d3-c1b8-4df1-be4c-d3b1cced8e30 rack1 UN 127.0.0.4 102.1 KB 256 ? fe27b958-5d3a-4f78-9880-76cb7c9bead1 rack1 UN 127.0.0.5 93.18 KB 256 ? 66eb3f23-8889-44d6-a9e7-ecdd57ed61d0 rack1 UN 127.0.0.6 102.12 KB 256 ? e2e99a7b-c1fb-4f2a-9e4f-7a4666f8245e rack1
  • 15. @chbatey Let’s see with with a 6 node cluster INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'chbatey', 'red', 'Technical Evangelist' ); INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'luket', 'red', 'Technical Evangelist' ); INSERT INTO staff (name, favourite_colour , job_title ) VALUES ( 'jonh', 'blue', 'Technical Evangelist' ); select * from staff where name in ('chbatey', 'luket', 'jonh');
  • 16. @chbatey Trace with CL ONE = 4 nodes used Execute CQL3 query | 2015-02-02 06:39:58.759000 | 127.0.0.1 | 0 Parsing select * from staff where name in ('chbatey', 'luket', 'jonh'); [SharedPool-Worker-1] | 2015-02-02 06:39:58.766000 | 127.0.0.1 | 7553 Preparing statement [SharedPool-Worker-1] | 2015-02-02 06:39:58.768000 | 127.0.0.1 | 9249 Executing single-partition query on staff [SharedPool-Worker-3] | 2015-02-02 06:39:58.773000 | 127.0.0.1 | 14255 Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2015-02-02 06:39:58.773001 | 127.0.0.1 | 14756 Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2015-02-02 06:39:58.773001 | 127.0.0.1 | 14928 Sending message to /127.0.0.3 [WRITE-/127.0.0.3] | 2015-02-02 06:39:58.774000 | 127.0.0.1 | 16035 Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.777000 | 127.0.0.5 | 1156 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.777001 | 127.0.0.5 | 1681 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.778000 | 127.0.0.5 | 1944 Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.778000 | 127.0.0.3 | 1554 Processing response from /127.0.0.5 [SharedPool-Worker-3] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 20762 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.779000 | 127.0.0.3 | 2425 Sending message to /127.0.0.5 [WRITE-/127.0.0.5] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 21198 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.779000 | 127.0.0.3 | 2639 Sending message to /127.0.0.6 [WRITE-/127.0.0.6] | 2015-02-02 06:39:58.779000 | 127.0.0.1 | 21208 Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.780000 | 127.0.0.5 | 304 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.780001 | 127.0.0.5 | 574 Executing single-partition query on staff [SharedPool-Worker-2] | 2015-02-02 06:39:58.781000 | 127.0.0.3 | 4075 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.781000 | 127.0.0.5 | 708 Enqueuing response to /127.0.0.1 [SharedPool-Worker-2] | 2015-02-02 06:39:58.781001 | 127.0.0.3 | 4348 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.782000 | 127.0.0.3 | 5371 Executing single-partition query on staff [SharedPool-Worker-1] | 2015-02-02 06:39:58.783000 | 127.0.0.6 | 2463 Enqueuing response to /127.0.0.1 [SharedPool-Worker-1] | 2015-02-02 06:39:58.784000 | 127.0.0.6 | 2905 Sending message to /127.0.0.1 [WRITE-/127.0.0.1] | 2015-02-02 06:39:58.784001 | 127.0.0.6 | 3160 Processing response from /127.0.0.6 [SharedPool-Worker-2] | 2015-02-02 06:39:58.785000 | 127.0.0.1 | -- Request complete | 2015-02-02 06:39:58.782995 | 127.0.0.1 | 23995
  • 17. @chbatey Denormalise with UDTs CREATE TYPE store (name text, type text, postcode text) CREATE TYPE staff (name text, fav_colour text, job_title text) 
 CREATE TABLE customer_events( customer_id text, time timeuuid, event_type text, store store, staff set<staff>, PRIMARY KEY ((customer_id), time)); User defined types
  • 18. @chbatey Less obvious example • A good pattern for time series: Bucketing
  • 19. @chbatey Adding a time bucket CREATE TABLE customer_events_bucket( customer_id text, time_bucket text time timeuuid, event_type text, store store, staff set<staff>, PRIMARY KEY ((customer_id, time_bucket), time));
  • 20. @chbatey Queries select * from customer_events_bucket where customer_id = ‘chbatey’ and time_bucket IN (‘2015-01-01|0910:01’, ‘2015-01-01|0910:02’, ‘2015-01-01|0910:03’, ‘2015-01-01| 0910:04’) Often better as multiple async queries
  • 21. @chbatey Tips for avoiding joins & multi gets • Say no to client side joins by denormalising - Much easier with UDTs • When bucketing aim for at most two buckets for a query • Get in the habit of reading trace + using a multi node cluster locally
  • 23. @chbatey Unlogged Batches • Unlogged batches - Send many statements as one
  • 24. @chbatey Customer events table CREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time))
  • 25. @chbatey Inserting events INSERT INTO events.customer_events (customer_id, time , event_type , statff_id , store_type ) VALUES ( ?, ?, ?, ?, ?) public void storeEvent(ConsistencyLevel consistencyLevel, CustomerEvent customerEvent) {
 BoundStatement boundInsert = insertStatement.bind( customerEvent.getCustomerId(), customerEvent.getTime(), customerEvent.getEventType(), customerEvent.getStaffId(), customerEvent.getStaffId());
 
 boundInsert.setConsistencyLevel(consistencyLevel);
 session.execute(boundInsert);
 }
  • 26. @chbatey Batch insert - Good idea? public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {
 BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.UNLOGGED);
 
 for (CustomerEvent event : events) { BoundStatement boundInsert = insertStatement.bind( customerEvent.getCustomerId(), customerEvent.getTime(), customerEvent.getEventType(), customerEvent.getStaffId(), customerEvent.getStaffId());
 
 batchStatement.add(boundInsert); } session.execute(batchStatement); }
  • 27. @chbatey Not so much client C Very likely to fail if nodes are down / over loaded Gives far too much work for the coordinator Ruins performance gains of token aware driver
  • 29. @chbatey Unlogged batch use case public void storeEvents(String customerId, ConsistencyLevel consistencyLevel, CustomerEvent... events) {
 BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.UNLOGGED);
 batchStatement.enableTracing();
 
 for (CustomerEvent event : events) {
 BoundStatement boundInsert = insertStatement.bind(
 customerId,
 event.getTime(),
 event.getEventType(),
 event.getStaffId(),
 event.getStaffId());
 boundInsert.enableTracing();
 boundInsert.setConsistencyLevel(consistencyLevel);
 batchStatement.add(boundInsert);
 }
 
 ResultSet execute = session.execute(batchStatement);
 logTraceInfo(execute.getExecutionInfo());
 } Partition Key!!
  • 30. @chbatey Writing this client C All for the same partition :)
  • 32. @chbatey Logged Batches • Once accepted the statements will eventually succeed • Achieved by writing them to a distributed batchlog • 30% slower than unlogged batches
  • 33. @chbatey Batch insert - Good idea? public void storeEvents(ConsistencyLevel consistencyLevel, CustomerEvent... events) {
 BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.LOGGED);
 
 for (CustomerEvent event : events) { BoundStatement boundInsert = insertStatement.bind( customerEvent.getCustomerId(), customerEvent.getTime(), customerEvent.getEventType(), customerEvent.getStaffId(), customerEvent.getStaffId());
 
 batchStatement.add(boundInsert); } session.execute(batchStatement); }
  • 34. @chbatey Not so much client C BATCH LOG BL-R BL-R BL-R: Batch log replica
  • 35. @chbatey Use case? CREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))
  • 36. @chbatey Storing events to both tables in a batch public void storeEventLogged(ConsistencyLevel consistencyLevel, CustomerEvent customerEvent) {
 BoundStatement boundInsertForCustomerId = insertByCustomerId.bind(customerEvent.getCustomerId(), customerEvent.getTime(), customerEvent.getEventType(), customerEvent.getStaffId(), customerEvent.getStaffId());
 BoundStatement boundInsertForStaffId = insertByStaffId.bind(customerEvent.getCustomerId(), customerEvent.getTime(), customerEvent.getEventType(), customerEvent.getStaffId(), customerEvent.getStaffId()); 
 BatchStatement batchStatement = new BatchStatement(BatchStatement.Type.LOGGED);
 batchStatement.enableTracing();
 batchStatement.setConsistencyLevel(consistencyLevel);
 batchStatement.add(boundInsertForCustomerId);
 batchStatement.add(boundInsertForStaffId);
 
 
 ResultSet execute = session.execute(batchStatement);
 }
  • 38. @chbatey Distributed mutable state :-( create TABLE accounts(customer text PRIMARY KEY, balance_in_pence int); Overly naive example • Cassandra uses last write wins (no Vector clocks) • Read before write is an anti-pattern - race conditions and latency
  • 39. @chbatey Take a page from Event sourcing create table accounts_log( customer text, time timeuuid, delta int, primary KEY (customer, time)); All records for the same customer in the same storage row Transactions ordered by TIMEUUID - UUIDs are your friends in distributed systems
  • 40. @chbatey Tips • Avoid mutable state in distributed system, favour immutable log • Can roll up and snapshot to avoid calculation getting too big • If you really have to then checkout LWTs and do via CAS - this will be slower
  • 43. @chbatey Retrying writes • Cassandra does not roll back! • R3 has the data and the coordinator has hinted for R1 and R2
  • 44. @chbatey Write timeout • Received acknowledgements • Required acknowledgements • Consistency level • CAS and Batches are more complicated: - WriteType: SIMPLE, BATCH, UNLOGGED_BATCH, BATCH_LOG, CAS
  • 45. @chbatey Batches • BATCH_LOG - Timed out waiting for batch log replicas • BATCH - Written to batch log but timed out waiting for actual replica - Will eventually be committed • UNLOGGED_BATCH
  • 46. @chbatey Idempotent writes • All writes are idempotent with the following exceptions: - Counters - lists
  • 47. @chbatey Cassandra as a Queue (tombstones)
  • 48. @chbatey Well documented anti-pattern • http://www.datastax.com/dev/blog/cassandra-anti- patterns-queues-and-queue-like-datasets • http://lostechies.com/ryansvihla/2014/10/20/domain- modeling-around-deletes-or-using-cassandra-as-a- queue-even-when-you-know-better/
  • 49. @chbatey Requirements • Produce data in one DC • Consume it once and delete from another DC • Use Cassandra?
  • 50. @chbatey Datamodel CREATE TABLE queues ( name text, enqueued_at timeuuid, payload blob, PRIMARY KEY (name, enqueued_at) ); SELECT enqueued_at, payload FROM queues WHERE name = 'queue-1' LIMIT 1;
  • 51. @chbatey Trace activity | source | elapsed -------------------------------------------+-----------+-------- execute_cql3_query | 127.0.0.3 | 0 Parsing statement | 127.0.0.3 | 48 Peparing statement | 127.0.0.3 | 362 Message received from /127.0.0.3 | 127.0.0.1 | 42 Sending message to /127.0.0.1 | 127.0.0.3 | 718 Executing single-partition query on queues | 127.0.0.1 | 145 Acquiring sstable references | 127.0.0.1 | 158 Merging memtable contents | 127.0.0.1 | 189 Merging data from memtables and 0 sstables | 127.0.0.1 | 235 Read 1 live and 19998 tombstoned cells | 127.0.0.1 | 251102 Enqueuing response to /127.0.0.3 | 127.0.0.1 | 252976 Sending message to /127.0.0.3 | 127.0.0.1 | 253052 Message received from /127.0.0.1 | 127.0.0.3 | 324314 Not good :(
  • 52. @chbatey Tips • Lots of deletes + range queries in the same partition • Avoid it with data modelling / query pattern: - Move partition - Add a start for your range: e.g. enqueued_at > 9d1cb818-9d7a-11b6-96ba-60c5470cbf0e
  • 54. @chbatey Customer events table CREATE TABLE if NOT EXISTS customer_events ( customer_id text, staff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) create INDEX on customer_events (staff_id) ;
  • 55. @chbatey Indexes to the rescue? customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty
  • 56. @chbatey Inserts INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'rusty', 42730bf0-c183-11e4-a4c6-1971740a12cd, 'SELL', 'trevor', 'WEB'); INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'rusty', 3f07cd70-c183-11e4-a4c6-1971740a12cd, 'BUY', 'trevor', 'WEB'); INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'chbatey', a9550c70-c182-11e4-a4c6-1971740a12cd, 'BUY', 'trevor', 'WEB'); INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'chbatey', aebbf3e0-c182-11e4-a4c6-1971740a12cd, 'SELL', 'trevor', 'WEB'); INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'chbatey', b8c2f050-c182-11e4-a4c6-1971740a12cd, 'VIEW', 'bill', 'WEB'); INSERT INTO customer_events (customer_id, time , event_type , staff_id, store_type ) VALUES ( 'chbatey', bcb5ae50-c182-11e4-a4c6-1971740a12cd, 'BUY', 'bill', 'WEB');
  • 57. @chbatey Secondary index are local • The staff_id partition in the secondary index is not distributed like a normal table • The secondary index entries are only stored on the node that contains the customer_id partition
  • 58. @chbatey Indexes to the rescue? staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey staff_id customer_id bill rusty bill rusty trevor rusty A B chbatey rusty customer_id time staff_id chbatey 2015-03-03 08:52:45 trevor chbatey 2015-03-03 08:52:54 trevor chbatey 2015-03-03 08:53:11 bill chbatey 2015-03-03 08:53:18 bill rusty 2015-03-03 08:56:57 bill rusty 2015-03-03 08:57:02 bill rusty 2015-03-03 08:57:20 trevor customer_events table staff_id customer_id trevor chbatey trevor chbatey bill chbatey bill chbatey bill rusty bill rusty trevor rusty staff_id index
  • 59. @chbatey Homework 1. Use a cluster with 6 nodes (CCM) 2. Create the customer events table and add the secondary index 3. Insert the data (insert statements are in a hidden slide I’ll distribute) 4. Turn tracing on and execute the following queries: A. select * from customer_events where staff_id = 'trevor' ; B. select * from customer_events where staff_id = 'trevor' and customer_id = ‘chbatey’ 5. How many partitions queried and nodes were used for each query? 6. Send me the answer on twitter, first couple get SWAG!
  • 60. @chbatey Do it your self index CREATE TABLE if NOT EXISTS customer_events ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (customer_id, time)) CREATE TABLE if NOT EXISTS customer_events_by_staff ( customer_id text, statff_id text, store_type text, time timeuuid , event_type text, PRIMARY KEY (staff_id, time))
  • 61. @chbatey Possibly in 3.0 - Global indexes
  • 62. @chbatey Anti patterns summary • Most anti patterns are very obvious in trace output • Don’t test on small clusters, most problems only occur at scale
  • 63. @chbatey Thanks for listening • Check out my blog: http://christopher- batey.blogspot.co.uk/ - Answers for all questions - Detailed posts on most of the anti patterns + full working code • More questions? Contact me on twitter: @chbatey • Learn more about read / write path, CCM checkout DataStax academy - https://academy.datastax.com/