Cassandra20141113

OVERVIEW AND REAL WORLD
APPLICATIONS
Cassandra
Jersey Shore Tech Meetup
Nov 13, 2014

You Are Not Here…
*** http://njhalloffame.org/
2

Agenda
3
 Some Basic Concepts/Overview
 New Developments In Cassandra
 Basic Data Modeling Concepts
 Materialized Views
 Secondary Indexes
 Counters
 Time Series Data
 Expiring Data

Cassandra High Level
4
Cassandra's architecture is based on the combination
of two technologies:
 Google BigTable – Data Model
 Amazon Dynamo – Distributed Architecture
BTW – these mean the same thing ->
Cassandra = C*

Architecture Basics & Terminology
5
 Nodes are single instances of C*
 Cluster is a group of nodes
 Data is organized by keys (tokens) which are
distributed across the cluster
 Replication Factor (rf) determines how many copies
are key
 Data Center Aware – works well in multi-DC/EC2
etc.
 Consistency Level – powerful feature to tune
consistency vs. speed vs. availability.’

More Architecture
7
 Information on who has what data and who is
available is transferred using gossip.
 No single point of failure (SPF), every node can
service requests.
 Handles Replication and Downed Nodes (within
reason)

CAP Theorem
8
 Distributed Systems Law:
 Consistency
 Availability
 Partition Tolerance
(you can only really have two in a distributed system)
 Cassandra is AP with Eventual Consistency

Consistency
9
 Cassandra Uses the concept of Tunable Consistency,
which make it very powerful and flexible for system
needs.

Data Model Architecture
13
 Keyspace – container of column families (tables).
Defines RF among others.
 Table – column family. Contains definition of
schema.
 Row – a “record” identified by a key
 Column - a key and a value

Deletions
15
 Distributed systems present unique problem for
deletes. If it actually deleted data and a node was
down and didn’t receive the delete notice it would try
and create record when came back online. So…
 Tombstone - The data is replaced with a special
value called a Tombstone, works within distributed
architecture

Keys
16
 Primary Key
 Partition Key – identifies a row
 Cluster Key – sorting within a row
 Using CQL these are defined together as a compound
(composite) key
 Compound keys are how you implement “wide
rows”, the COOL FEATURE!

Single Primary Key
17
create table users (
user_id UUID PRIMARY KEY,
firstname text,
lastname text,
emailaddres text
);
** Cassandra Data Types
http://www.datastax.com/documentation/cql/3.0/cql/cql_ref
erence/cql_data_types_c.html

Compound Key
18
emailaddress text,
department text,
firstname text,
lastname text,
PRIMARY KEY (emailaddress, department)
);
 Partition Key plus Cluster Key
 emailaddress is partition key
 department is cluster key

Compound Key
19
emailaddress text,
department text,
country text,
firstname text,
lastname text,
PRIMARY KEY ((emailaddress, department), country)
);
 Partition Key plus Cluster Key
 Emailaddress & department is partition key
 country is cluster key

New Rules
20
 Writes Are Cheap
 Denormalize All You Need
 Model Your Queries, Not Data (understand access
patterns)
 Application Worries About Joins

What’s New In 2.0
21
Conditional DDL
IF Exists or If Not Exists
Drop Column Support
ALTER TABLE users DROP lastname;

More New Stuff
22
 Triggers
CREATE TRIGGER myTrigger
ON myTable
USING 'com.thejavaexperts.cassandra.updateevt'
 Lightweight Transactions (CAS)
UPDATE users
SET firstname = 'tim'
WHERE emailaddress = 'tpeters@example.com'
IF firstname = 'tom';
** Not like an ACID Transaction!!

CAS & Transactions
23
 CAS - compare-and-set operations. In a single,
atomic operation compares a value of a column in
the database and applying a modification depending
on the result of the comparison.
 Consider performance hit. CAS is (was) considered
an anti-pattern.

Data Modeling… The Basics
24
 Cassandra now is very familiar to RDBMS/SQL
users.
 Very nicely hides the underlying data storage model.
 Still have all the power of Cassandra, it is all in the
key definition.
RDBMS = model data
Cassandra = model access (queries)

Side-Note On Querying
25
 Create table with compound key
 Select using ALLOW FILTERING
 Counts
 Select using IN or =

Batch Operations
26
 Saves Network Roundtrips
 Can contain INSERT, UPDATE, DELETE
 Atomic by default (all or nothing)
 Can use timestamp for specific ordering

Batch Operation Example
27
BEGIN BATCH
INSERT INTO users (emailaddress, firstname, lastname, country) values
('brian.enochson@gmail.com', 'brian', 'enochson', 'USA');
('tpeters@example.com', 'tom', 'peters', 'DE');
('jsmith@example.com', 'jim', 'smith', 'USA');
('arogers@example.com', 'alan', 'rogers', 'USA');
DELETE FROM users WHERE emailaddress = 'jsmith@example.com';
APPLY BATCH;
 select in cqlsh
 List in cassandra-cli with timestamp

More Data Modeling…
28
 No Joins
 No Foreign Keys
 No Third (or any other) Normal Form Concerns
 Redundant Data Encouraged. Apps maintain
consistency.

Secondary Indexes
29
 Allow defining indexes to allow other access than
partition key.
 Each node has a local index for its data.
 They have uses, but shouldn’t be used all the time
without consideration.
 We will look at alternatives.

Secondary Index Example
30
 Create a table
 Try to select with column not in PK
 Add Secondary Index
 Try select again. (maybe need to reinsert)

When to use?
31
 Low Cardinality – small number of unique values
 High Cardinality – high number of distinct values
 Secondary Indexes are good for Low Cardinality. So
country codes, department codes etc. Not email
addresses.

Materialized View
32
 Want full distribution can use what is called a
Materialized View pattern.
 Remember redundant data is fine.
 Model the queries

Materialized View Example
33
 Show normal able with compound key and querying
limitations
 Create Materialized View Table With Different
Compound Key, support alternate access.
 Selects use partition key.
 Secondary indexes local, not distributed
 Allow filtering. Can cause performance issues

Counters
34
 Updated in 2.1 and now work in a more distributed
and accurate manner.
 Table organization, example
 How to update, view etc.

Time Series Example….
35
 Time series table model.
 Need to consider interval for event frequency and
wide row size.
 Make what is tracked by time and unit of interval
partition key.

Time Series Data
36
 Due to its quick writing model Cassandra is suited
for storing time series data.
 The Cassandra wide row is a perfect fit for modeling
time series / time based events.
 Let’s look at an example….

Event Data
37
 Notice primary key and cluster key.
 Insert some data
 View in CQL, then in CLI as wide row

TTL – Self Expiring Data
38
 Another technique is data that has a defined lifespan.
 For instance session identifiers, temporary
passwords etc.
 For this Cassandra provides a Time To Live (TTL)
mechanism.

TTL Example…
39
 Create table
 Insert data using TTL
 Can update specific column with table
 Show using selects.

Questions
40
 http://www.thejavaexperts.net/
 Email: brian.enochson@gmail.com
 Twitter: @benochso
 G+: https://plus.google.com/+BrianEnochson

Cassandra20141113

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra20141113

Similar to Cassandra20141113 (20)

More from Brian Enochson

More from Brian Enochson (6)

Recently uploaded

Recently uploaded (20)

Cassandra20141113