This document provides an overview of Cassandra data modeling concepts. It discusses Cassandra data types like collections (sets, lists, maps) and how to model different types of tables, including static, dynamic, and time series tables. It also covers primary keys, clustering columns, query patterns, and other Cassandra features like lightweight transactions and user defined functions. The overall document is a guide to understanding Cassandra data modeling fundamentals.
2. Where are we now?
Pt. 1 - Transition from Oracle to Cassandra
How does it work? What are the differences?
Pt. 2 - Cassandra Data Model
All the details on how to use Cassandra Query Language (CQL)
Pt. 3 - Building a Cassandra Application
Wrap it all up with a top down application design. Combining everything we’ve
learned plus a more.
9. Insert
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata)
VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',
9761d3d7-7fbd-4269-9988-6cfd4e188678,
'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1,
{'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'},
'2013-05-02 12:30:29');
Table Name
Fields
Values
Partition Key: Required
10. Partition keys
06049cbb-dfed-421f-b889-5f649a0de1ed Murmur3 Hash Token = 7224631062609997448
873ff430-9c23-4e60-be5f-278ea2bb21bd Murmur3 Hash Token = -6804302034103043898
Consistent hash. 128 bit number
between 2-63
and 264
INSERT INTO videos (videoid, name, userid, description)
VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.’,
9761d3d7-7fbd-4269-9988-6cfd4e188678, 'First in a three part series for Cassandra Data Modeling');
INSERT INTO videos (videoid, name, userid, description)
VALUES (873ff430-9c23-4e60-be5f-278ea2bb21bd,'Become a Super Modeler’,
9761d3d7-7fbd-4269-9988-6cfd4e188678, 'Second in a three part series for Cassandra Data Modeling');
17. Select
name | description | added_date
---------------------------------------------------+----------------------------------------------------------+--------------------------
The data model is dead. Long live the data model. | First in a three part series for Cassandra Data Modeling | 2013-05-02 12:30:29-0700
SELECT name, description, added_date
FROM videos
WHERE videoid = 06049cbb-dfed-421f-b889-5f649a0de1ed;
Fields
Table Name
Primary Key: Partition Key Required
18. Controlling Order
CREATE TABLE raw_weather_data (
wsid text,
year int,
month int,
day int,
hour int,
temperature double,
PRIMARY KEY ((wsid), year, month, day, hour)
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,10,-5.6);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,9,-5.1);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,8,-4.9);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
20. Write Path
Client
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature)
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
year 1wsid 1 month 1 day 1 hour 1
year 2wsid 2 month 2 day 2 hour 2
Memtable
SSTable
SSTable
SSTable
SSTable
Node
Commit Log Data * Compaction *
Temp
Temp
21. Storage Model - Logical View
2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
wsid hour temperature
2005:12:1:7
-5.3
10010:99999
SELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1;
22. 2005:12:1:10
-5.6 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1;
23. 2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1;
24. 2005:12:1:10
-5.6
2005:12:1:11
-4.9 -5.3-4.9-5.1
Storage Model - Disk Layout
2005:12:1:9 2005:12:1:8
10010:99999
2005:12:1:7
Merged, Sorted and Stored Sequentially
SELECT wsid, hour, temperature
FROM raw_weather_data
WHERE wsid=‘10010:99999’
AND year = 2005 AND month = 12 AND day = 1;
2005:12:1:12
-5.4
26. Query patterns
• Range queries
• “Slice” operation on disk
Single seek on disk
10010:99999
Partition key for locality
SELECT wsid,hour,temperature
FROM raw_weather_data
WHERE wsid='10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
2005:12:1:10
-5.6 -5.3-4.9-5.1
2005:12:1:9 2005:12:1:8 2005:12:1:7
27. Query patterns
• Range queries
• “Slice” operation on disk
Programmers like this
Sorted by event_time
2005:12:1:10
-5.6
2005:12:1:9
-5.1
2005:12:1:8
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:7
-5.3
10010:99999
SELECT weatherstation,hour,temperature
FROM temperature
WHERE weatherstation_id=‘10010:99999'
AND year = 2005 AND month = 12 AND day = 1
AND hour >= 7 AND hour <= 10;
29. Collections
Set
tags set<varchar>
CQL Type: For Ordering
Column Name CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
30. Collections
Set
List
Column Name
Column Name
CQL Type
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
tags set<varchar>
CQL Type: For Ordering
tags set<varchar>
31. Collections
Set
List
Map
preview_thumbnails map<text,text>
Column Name
Column Name
CQL Key Type CQL Value Type
Column Name CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CQL Type
tags set<varchar>
tags set<varchar>
CQL Type: For Ordering
32. Aggregates (Sort of)
*As of Cassandra 2.2
•Built-in: avg, min, max, count(<column name>)
•Runs on server
•Always use with partition key
33. User Defined Functions
CREATE FUNCTION maxI(current int, candidate int)
CALLED ON NULL INPUT
RETURNS int LANGUAGE java AS
'if (current == null) return candidate; else return Math.max(current, candidate);' ;
CREATE AGGREGATE maxAgg(int)
SFUNC maxI
STYPE int
INITCOND null;
CQL Type
Pure Function
SELECT maxAgg(temperature)
FROM raw_weather_data
WHERE wsid='10010:99999'
AND year = 2005 AND month = 12 AND day = 1
Aggregate using
function over
partition
34. Lightweight Transactions
Don’t overwrite!
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata)
VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',
9761d3d7-7fbd-4269-9988-6cfd4e188678,
'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1,
{'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'},
'2013-05-02 12:30:29’)
IF NOT EXISTS;
35. Lightweight Transactions
No-op. Don’t throw error
CREATE TABLE IF NOT EXISTS videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (tag, videoid)
);
36. Regular Update
UPDATE videos
SET name = 'The data model is dead. Long live the data model.'
WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;
Table Name
Fields to Update: Not in Primary Key
Primary Key
37. Lightweight Transactions
Don’t overwrite!
UPDATE videos
SET name = 'The data model is dead. Long live the data model.'
WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed
IF userid = 9761d3d7-7fbd-4269-9988-6cfd4e188678;
40. Expiring Data
Time To Live = TTL
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata)
VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',
9761d3d7-7fbd-4269-9988-6cfd4e188678,
'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1,
{'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'},
'2013-05-02 12:30:29’)
USING TTL = 2592000
Expire Data: 30 Days
42. Oracle to Cassandra Core Concepts Guide Pt. 3
Tired of timeouts? Cursing your cursors? Join the distributed revolution and bring
your dev team into application nirvana. You won’t believe how easy it is to be code
complete on your next big project. We will show you how to lead your devs away
from the clutches of the DBA and be in control of their own data destiny. Discover
the methodology that will make your Cassandra project epic.
Stay tuned!