A lot has changed since I gave one of these talks and man, has it been good. 2.0 brought us a lot of new CQL features and now with 2.1 we get even more! Let me show you some real life data models and those new features taking developer productivity to an all new high. User Defined Types, New Counters, Paging, Static Columns. Exciting new ways of making your app truly killer!
6. KillrVideo.com
• 2012 Summit
• Complete example for data
modeling
www.killrvideos.com
Video Title
Recommended
Meow
Ads
by Google
Description
Comments
Upload New!
Username
Rating: Tags: Foo Bar
*Cat drawing by goodrob13 on Flickr
8. Data Model - Revisited
• Add in some 2.1 data models
• Replace (or remove) some app code
• Become a part of Cassandra OSS download
9. User Defined Types
• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting! CREATE TYPE address (
street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
10. Before
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE video_metadata (
video_id uuid PRIMARY KEY,
height int,
width int,
video_bit_rate set<text>,
encoding text
);
SELECT *
FROM videos
WHERE videoId = 2;
!
SELECT *
FROM video_metadata
WHERE videoId = 2;
Title: Introduction to Apache Cassandra
!
Description: A one hour talk on everything
you need to know about a totally amazing
database.
Playback rate:
480 720
In-application
join
11. After
• Now video_metadata is
embedded in videos
CREATE TYPE video_metadata (
height int,
width int,
video_bit_rate set<text>,
encoding text
);
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
metadata set <frozen<video_metadata>>,
added_date timestamp,
PRIMARY KEY (videoid)
);
12. Wait! Frozen??
• Staying out of technical
debt
• 3.0 UDTs will not have to
be frozen
• Applicable to User Defined
Types and Tuples (wait for
Do you want to build a schema?
Do you want to store some JSON?
23. Static Fields
• New as of 2.0.6
• VERY specific, but useful
• Thrift people will like this
CREATE TABLE t (
k text,
s text STATIC,
i int,
PRIMARY KEY (k, i)
);
24. Why?
CREATE TABLE weather (
id int,
time timestamp,
weatherstation_name text,
temperature float,
PRIMARY KEY (id, time)
);
ID = 1
Partition Key
(Storage Row Key)
2014-09-08 12:00:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:01:00 :
name
SFO
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:02:00 :
name
SFO
2014-09-08 12:00:00 :
temp
64.0
Partition Row 1 Partition Row 2 Partition Row 3
ID = 1
Partition Key
(Storage Row Key)
name
SFO
Partition Row 1 Partition Row 1 Partition Row 1
2014-09-08 12:00:00 :
temp
63.4
2014-09-08 12:00:00 :
temp
63.9
2014-09-08 12:00:00 :
temp
64.0
CREATE TABLE weather (
id int,
time timestamp,
weatherstation_name text static,
temperature float,
PRIMARY KEY (id, time)
);
25. Usage
• Put a static at the end
of the declaration
• Can’t be a part of:
CREATE TABLE video_event (
videoid uuid,
userid uuid,
preview_image_location text static,
event varchar,
event_timestamp timeuuid,
video_timestamp bigint,
PRIMARY KEY ((videoid,userid),event_timestamp,event)
) WITH CLUSTERING ORDER BY (event_timestamp DESC,event ASC);
26. Tuples
CREATE TABLE tuple_table (
id int PRIMARY KEY,
three_tuple frozen <tuple<int, text, float>>,
four_tuple frozen <tuple<int, text, float, inet>>,
five_tuple frozen <tuple<int, text, float, inet, ascii>>
);
• A type that represents a group
• Up to 256 different elements
27. Example Usage
• Track a drone’s position
• x, y, z in a 3D Cartesian
CREATE TABLE drone_position (
droneId int,
time timestamp,
position frozen <tuple<float, float, float>>,
PRIMARY KEY (droneId, time)
);
28. What about partition size?
• A CQL partition is a logical projection of a storage row
• Storage rows can have up to 2 billion cells
• Each cell can hold up to 2G of data
29. How much is too much?
• How many cells before performance degrades?
• How many bytes per partition before it’s unmanageable
• What is “practical”
30. Old answer
• 2011: Pre-Cassandra 1.2 (actually tested on .8)
• Aaron Morton, Cassandra MVP and Founder of The Last Pickle
31. Conclusion
• Keep partition (storage row) length < 10k cells
• Total size in bytes below 64M (Multi-pass compaction)
• Multiple hits to 64k page size will start to hurt
TL;DR - It’s a performance tunable
32. The tests revisited
• Attempted to reproduce the same tests using CQL
• Cassandra 2.1, 2.0 and 1.2
• Tested partitions sizes 1. 100
2. 2114
3. 5,000
4. 10,000
5. 100,000
6. 1,000,000
7. 10,000,000
8. 100,000,000
9. 1,000,000,000