Cassandra 3.0 Data Modeling

@PatrickMcFadin
Patrick McFadin 
Chief Evangelist, DataStax
Cassandra 3.0 Data Modeling
1

CQL 3.0 - Cassandra 1.2
• Goodbye CQL 2.0!
• Custom secondary indexes
• Empty IN

• Aliases
• CREATE <table> IF NOT EXISTS
• INSERT IF NOT EXISTS
• UPDATE IF
• DELETE IF EXISTS
• IN supports cluster columns
LWT

• User Defined Types
• Collection Indexing
• Indexes can use contains
• Tuples?

User Deﬁned Types
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);

User Deﬁned Types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);

• Date and Time are now types
• TinyInt and SmallInt
• User Defined Functions
• Aggregates
• User Defined Aggregates

User Deﬁned Functions
CREATE TABLE video_rating (
videoid uuid,
rating_counter counter,
rating_total counter,
);
CREATE OR REPLACE FUNCTION
avg_rating (rating_counter counter, rating_total counter)
CALLED ON NULL INPUT
RETURNS double
LANGUAGE java AS
'return Double.valueOf(rating_total.doubleValue()/
rating_counter.doubleValue());';

User Deﬁned Functions
SELECT avg_rating(rating_counter, rating_total) AS avg_rating
FROM video_rating
WHERE videoid = 99051fe9-6a9c-46c2-b949-38ef78858dd0;

Aggregates
CREATE TABLE video_ratings_by_user (
videoid uuid,
userid uuid,
rating int,
PRIMARY KEY (videoid, userid)
);
SELECT count(userid)
FROM video_ratings_by_user
WHERE videoed = 49f64d40-7d89-4890-b910-dbf923563a33

CQL 3.4 - Cassandra 3.x
• CAST operator
• Per Partition Limit
• Materialized Views
• SASI

Materialized View
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
);
Lookup by this?

Materialized View
CREATE TABLE videos_by_location (
videoid uuid,
userid uuid,
location text,
PRIMARY KEY (location, videoid)
);
Roll your own

Materialized View
CREATE MATERIALIZED VIEW videos_by_location
AS SELECT userid, added_date, videoid, location
FROM videos
WHERE videoId IS NOT NULL AND location IS NOT NULL
PRIMARY KEY(location, videoid);
Cassandra rolls for you

Materialized View Perf
5 Materialized Views vs 5 tables writes async

Materialized View
SELECT location, videoid
FROM videos_by_location ;
location | videoid
-------------------------------------------------+--------------------------------------
http://www.youtube.com/watch?v=px6U2n74q3g | 06049cbb-dfed-421f-b889-5f649a0de1ed
http://www.youtube.com/watch?v=qphhxujn5Es | 873ff430-9c23-4e60-be5f-278ea2bb21bd
/us/vid/0c/0c3f7e87-f6b6-41d2-9668-2b64d117102c | 0c3f7e87-f6b6-41d2-9668-2b64d117102c
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | 99051fe9-6a9c-46c2-b949-38ef78858dd0
/us/vid/b3/b3a76c6b-7c7f-4af6-964f-803a9283c401 | b3a76c6b-7c7f-4af6-964f-803a9283c401
http://www.youtube.com/watch?v=HdJlsOZVGwM | 49f64d40-7d89-4890-b910-dbf923563a33
/us/vid/41/416a5ddc-00a5-49ed-adde-d99da9a27c0c | 416a5ddc-00a5-49ed-adde-d99da9a27c0c

SASI
CREATE TABLE users (
userid uuid,
firstname varchar,
lastname varchar,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Lookup by this?

Storage Attached Secondary Index

SASI
CREATE CUSTOM INDEX ON users (firstname)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'analyzer_class':
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
'case_sensitive': 'false'
};

SASI
CREATE CUSTOM INDEX ON users (lastname)
WITH OPTIONS = {'mode': 'CONTAINS'};

SASI
CREATE CUSTOM INDEX ON users (created_date)
WITH OPTIONS = {'mode': 'SPARSE'};

SASI Indexes
Client
INSERT INTO users(userid,firstname,lastname,email,created_date) 
VALUES (9761d3d7-7fbd-4269-9988-6cfd4e188678,’Patrick’,’McFadin’,
’patrick@datastax.com’,’2015-06-01’);
userid 1
userid 2
Memtable
SSTable
SSTable
SSTable
SASI Index
Node
Data
lastname
lastname
firstname
firstname
email
email
created_date
created_date
SASI Index
SASI Index
Indexer

SASI Queries
SELECT * FROM users WHERE firstname LIKE 'pat%';
SELECT * FROM users WHERE lastname LIKE ‘%Fad%';
SELECT * FROM users WHERE email LIKE '%data%';
SELECT * FROM users
WHERE created_date > '2011-6-15'
AND created_date < '2011-06-30';
userid | created_date | email | firstname | lastname
--------------------------------------+---------------------------------+----------------------+-----------+----------
9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | patrick@datastax.com | Patrick | McFadin

SASI Guidelines
• Multiple fields to search
• No more than 1000 rows returned
• You know the partition key
• Indexing static columns
Use SASI when…

SASI Guidelines
• Searching large partitions
• Tight SLA on reads
• Search for analytics
• Ordering search is important
Don’t Use SASI when…

Cassandra 3.0 Data Modeling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cassandra 3.0 Data Modeling

Similar to Cassandra 3.0 Data Modeling (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

Cassandra 3.0 Data Modeling