SlideShare a Scribd company logo
1 of 104
C* Data Modeling
Travis Price, Solution Architect
Agenda
• C* Schema Overview
• Columns and their components
• Column Families
• Keyspaces
• Designing a Data Model
• Partitioning
• Indexing
• Keys
• Example Models
• Shopping Cart
• User Activity
• Logging
• Form Versioning
• Advanced Modeling
• LWT
• Indexing by Table
Cassandra Architecture Review
©2013 DataStax Confidential. Do not distribute without consent. 3
Write
Read
Write
Write
Read
Cassandra
Multi-Data
Center
Terms
©2013 DataStax Confidential. Do not distribute without consent. 4
Cluster
Datacenter
Node
Keyspace
Column Family
Partition
Storage row vs. CQL row
Snitch
Replication Factor
Consistency
System and hardware failures can and do
happen
Peer-to-peer, distributed system
All nodes identical – Gossip status and state
Data partitioned among nodes
Data replication ensures fault tolerance
Read/write anywhere design
Cassandra Architecture
80
50 30
1070
60
40
20
Schema is a row-oriented, column structure
A keyspace is akin to a database
Tables are more flexible/dynamic
A row in a table is indexed by its key
Other columns may be indexed as well
Keyspaces and Tables
80
50 30
1070
60
40
20
Key Name SSN DOB
Keyspace
Table
Writing Data
80
50 30
1070
60
40
20
Data is partitioned by token
Nodes own token and range back to previous token
Nodes may own multiple tokens (rf, vnodes)
Client connects to any node
Node takes on role of coordinator during transaction
hash(key) => token, rd/wr node(s)
Cluster owns all data – typically, each DC does, too
Nodes own slices of data - one, or more
Replicas can be rack-aware
Client
Write - key
hash(key)=43
Replication and DCs
80
50 30
1070
60
40
20
Client
Write - key
hash(key)=43
81
51 31
1171
61
41
21
Data Center 1; rf=3 Data Center 2; rf=3
Choose between strong and eventual consistency (one to
all nodes acknowledging) depending upon the need
Can be configured on a per-operation basis, and for both
reads and writes
Handles multi-data center transparently
CLwr + CLrd > RF yields consistency
Tunable Consistency
80
50 30
1070
60
40
20
Familiar SQL syntax without joins, group by, etc
Create objects via DDL (e.g. CREATE...)
Core DML commands supported (e.g.
CREATE, UPDATE, DELETE)
Query data with SELECT
CQL Language
80
50 30
1070
60
40
20
B l o o m F i l t e r s
K e y C a c h e
R o w C a c h e
O S C a c h e
Commit
Log
(1/node)
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
...
memtable (1/cf)
key; col1=val1,..coln=valn
key; col1=val1,..coln=valn
...
key; col1=val1,..coln=valn
...
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
sstable (n/cf)
...
s
m
x
key; col1=val1,..coln=valncdx
r
d
x
key; col1=val1,..coln=valncdx
key; col1=val1,..coln=valncdx
Read Path
M
e
m
o
r
y
Read / Write Path
Write Path
D
i
s
k
H
e
a
p
The Cassandra Schema
Consists of:
• Column
• Column Family
• Keyspace
• Cluster
High Level Overview
Keyspace
Column Family /Table
Rows
Columns
Components of the Column
The column is the fundamental data
type in Cassandra and includes:
• Column name
• Column value
• Timestamp
• TTL (Optional)
The Column
Name
Value
Timestamp
(Name: “firstName”, Value: “Travis”, Timestamp: 1363106500)
Column Name
• Can be any value
• Can be any type
• Not optional
• Must be unique
• Stored with every value
Column Value
• Any value
• Any type
• Can be empty – but is required
Column Names and Values
• the data type for a column value is called a validator.
• The data type for a column name is called a comparator.
• Cassandra validates that data type of the keys of rows.
• Columns are sorted, and stored in sorted order on disk, so you have to
specify a comparator for columns. This can be reversed… more on this
later
Data Types
Column TimeStamp
• 64-bit integer
• Best Practice
– Should be created in a consistent manner by all your clients
• Required
Column TTL
• Defined on INSERT
• Positive delay (in seconds)
• After time expires it is marked for deletion
Special Types of Columns
• Counter
• Collections
Counters
• Allows for addition / subtraction
• 64-bit value
• No timestamp
update recommendation_summary
set num_products = num_products + 1
where recommendation = 'highly recommend';
Cassandra feature - Collections
• Collections give you three types:
- Set
- List
- Map
• Each allow for dynamic updates
• Fully supported in CQL 3
• Requires serialization
24
CREATE TABLE collections_example (
id int PRIMARY KEY,
set_example set<text>,
list_example list<text>,
map_example map<int,text>
);
Cassandra Collections - Set
• Set is sorted by CQL type comparator
25
INSERT INTO collections_example (id, set_example)
VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection
name
Collection
type
CQL
Type
Cassandra Collections - Set Operations
26
UPDATE collections_example
SET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_example
SET set_example = set_example - {'3-three'} WHERE id = 1;
• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
Cassandra Collections - List
• Ordered by insertion
27
list_example list<text>
Collection
name
Collection
type
CQL
Type
INSERT INTO collections_example (id, list_example)
VALUES(1, ['1-one', '2-two']);
Cassandra Collections - List Operations
• Adding an element to the end of a list
28
UPDATE collections_example
SET list_example = list_example + ['3-three'] WHERE id = 1;
UPDATE collections_example
SET list_example = ['0-zero'] + list_example WHERE id = 1;
• Adding an element to the beginning of a list
UPDATE collections_example
SET list_example = list_example - ['3-three'] WHERE id = 1;
• Deleting an element from a list
Cassandra Collections - Map
• Key and value
• Key is sorted by CQL type comparator
29
INSERT INTO collections_example (id, map_example)
VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection
name
Collection
type
Value CQL
Type
Key CQL
Type
Cassandra Collections - Map Operations
• Add an element to the map
30
UPDATE collections_example
SET map_example[3] = 'three' WHERE id = 1;
UPDATE collections_example
SET map_example[3] = 'tres' WHERE id = 1;
DELETE map_example[3]
FROM collections_example WHERE id = 1;
• Update an existing element in the map
• Delete an element in the map
User model
• Enhance a simple user table
• Great for static + some dynamic
• Takes advantage of row level isolation
31
CREATE TABLE user_with_location (
username text PRIMARY KEY,
first_name text,
last_name text,
address1 text,
city text,
postal_code text,
last_login timestamp,
location_by_date map<timeuuid,text>
);
User profile - Operations
• Adding new login locations to the map
32
UPDATE user_with_location
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
UPDATE user_with_location
USING TTL 2592000 // 30 Days
SET last_login = now(), location_by_date = {now() : '123.123.123.1'}
WHERE username='PatrickMcFadin';
• Adding new login locations to the map + TTL!
The Cassandra Schema
Consists of:
• Column
• Column Family
• Keyspace
• Cluster
Column Families / Tables
• Similar to tables
• Groupings of Rows
• Tunable Consistency
• De-Normalization
• To avoid I/O
• Simplify the Read Path
• Static or Dynamic
Static Column Families
• Are the most similar to a relational table
• Most rows have the same column names
• Columns in rows can be different
jbellis
Name Email Address State
Jonathan jb@ds.com 123 main TX
dhutch
Name Email Address State
Daria dh@ds.com 45 2nd St. CA
egilmore
Name Email
eric eg@ds.com
Row Key Columns
Dynamic Column Families
• Also called “wide rows”
• Structured so a query into the row will answer a question
jbellis
dhutch egilmore datastax mzcassie
dhutch
egilmore
egilmore
datastax mzcassie
Row Key Columns
Dynamic Table CQL3 Example
Clustering Order (Comparators)
• Sorts columns on disk by default
• Can change the order
The Cassandra Schema
Consists of:
• Column
• Column Family
• Keyspace
• Cluster
Keyspaces
• Are groupings of Column Families
• Replication strategies
• Replication factor
CREATE KEYSPACE demodb
WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3};
ALTER KEYSPACE dsg
WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3};
Replication Strategies
Two replication strategies are available:
• SimpleStrategy: Use for a single data center only. If you ever intend more
than one data center, use the NetworkTopologyStrategy.
• NetworkTopologyStrategy: Highly recommended for most deployments
because it is much easier to expand to multiple data centers when
required by future expansion.
©2013 DataStax Confidential. Do not distribute without consent. 41
Snitches
• A snitch determines which data centers and racks are written to and read
from.
• Snitches inform Cassandra about the network topology so that requests
are routed efficiently and allows Cassandra to distribute replicas by
grouping machines into data centers and racks. All nodes must have
exactly the same snitch configuration. Cassandra does its best not to have
more than one replica on the same rack (which is not necessarily a
physical location).
©2013 DataStax Confidential. Do not distribute without consent. 42
Snitch Types
SimpleSnitch
• single-data center deployments (or single-zone in public clouds)
RackInferringSnitch
• determines the location of nodes by rack and data center, which are assumed to correspond to the 3rd and 2nd
octet of the node's IP address
PropertyFileSnitch
• user-defined description of the network details
• cassandra-topology.properties file
GossipingPropertyFileSnitch
• defines a local node's data center and rack
• uses gossip for propagating this information to other nodes
• conf/cassandra-rackdc.properties
Amazon Snitches
• EC2Snitch
• EC2MultiRegionSnitch
©2013 DataStax Confidential. Do not distribute without consent. 43
Complex Queries
Partitioning and Indexing
Partitioners
Generates the token for a key, determining which node owns the record
Cassandra offers the following partitioners:
• Murmur3Partitioner (default): uniformly distributes data across the
cluster based on MurmurHash hash values.
• RandomPartitioner: uniformly distributes data across the cluster based
on MD5 hash values.
• ByteOrderedPartitioner: keeps an ordered distribution of data lexically by
key bytes
©2013 DataStax Confidential. Do not distribute without consent. 45
Partitioners – paging through all rows
• SELECT * FROM test WHERE token(k) >
token(42);
CQL 3 forbids such a query unless the partitioner in use is ordered. Even when using the random partitioner or the murmur3
partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL 3 includes the token function:
The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner
distribute tokens in a completely unordered manner. The token function makes it possible to page through unordered partitioner
results. Using the token function actually queries results directly using tokens. Underneath, the token function makes token-
based comparisons and does not convert keys to tokens (not k > 42).
Primary Index Overview
• Index for all of your row keys
• Per-node index
• Partitioner + placement manages
which node
• Keys are just kept in ordered buckets
• Partitioner determines how K 
Token
Natural Keys
Examples:
• An email address
• A user id
• Easy to make the relationship
• Less de-normalization
• More risk of an „UPSERT‟
• Changing the key requires a bulk copy
operation
Surrogate Keys
• Example:
• UUID
• Independently generated
• Allows you to store multiple versions of a
user
• Relationship is now indirect
• Changing the key requires the creation of
a new row, or column
Compound (Composite) Primary Keys
Key Structure
CREATE TABLE playlists (
id uuid,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_id)
);
Partition (row) Key Clustering Key
Sorting
• It‟s Free!
• ONLY on the second column in
compound Primary Key
Composite Partition Keys
storeid and eventid are used to create the token
©2013 DataStax Confidential. Do not distribute without consent. 53
CREATE TABLE pricechange(
storeid int,
eventid int,
sku text,
oldprice double,
newprice double,
PRIMARY KEY ((storeid, eventid), sku)
);
Secondary Indexes
• Need for an easy way to do limited ad-hoc
queries
• Supports multiple per row
• Single clause can support multiple
selectors
• Implemented as a hash map, not B-Tree
• Low cardinality ONLY
Secondary Indexes
Conditional Operators
ALLOW FILTERING
Valid Queries:
©2013 DataStax Confidential. Do not distribute without consent. 57
CREATE TABLE users (
username text PRIMARY KEY,
firstname text,
lastname text,
birth_year int,
country text
)
CREATE INDEX ON users(birth_year);
SELECT * FROM users;
SELECT firstname, lastname FROM users WHERE birth_year = 1981;
Invalid query:
SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR';
SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR’
ALLOW FILTERING;
Cassandra cannot guarantee that
it won‟t have to scan large amount
of data even if the result to those
query is small. Typically, it will
scan all the index entries for users
born in 1981 even if only a
handful are actually from France.
Designing a Data Model
The Basics of C* Modeling
• Work backwards
• What does your application do?
• What are the access patterns?
• Now design your data model
Procedures
Consider use case requirements
• What data?
• Ordering?
• Filtering?
• Grouping?
• Events in chronological order?
• Does the data expire?
De-Normalization
• De-Normalize in C*
• Forget third normal form
• Data Duplication is OK
• Concerns
• Resource contention
• Latency
• Client-side joins
• Avoid them in your C* code
Foreign Keys
• There are no foreign keys
• No server-side joins
Table Design
• Ideally each query will be one row
• Compared to other resources, disk
space is cheap
• Reduce disk seeks
• Reduce network traffic
Workload Preference
• High level of de-normalization means you
may have to write the same data many
times
• Cassandra handles large numbers of
writes well
• If given the choice:
• Read once
• Write many
Concurrent Writes
• A row is always referenced by a Key
• Keys are just bytes
• They must be unique within a CF
• Primary keys are unique
• But Cassandra will not enforce uniqueness
• If you are not careful you will accidentally
[UPSERT] the whole thing
The 5 C* Commandments for Developers
1. Start with queries. Don‟t data model for data modeling sake.
2. It‟s ok to duplicate data.
3. C* is designed to read and write sequentially. Great for rotational
disk, awesome for SSDs, awful for NAS. If your disk has an Ethernet
port, it‟s not good for C*.
4. Use secondary indexes strategically and cautiously.
5. Embrace wide rows and de-normalization
Online stores require 100% uptime
Shopping Cart Data Model
Shopping cart use case
* Store shopping cart data reliably
* Minimize (or eliminate) downtime. Multi-dc
* Scale for the “Cyber Monday” problem
* Every minute off-line is lost $$
* Online shoppers want speed!
The
bad
Shopping cart data
model
* Each customer can have
one or more shopping
carts
* De-normalize data for
fast access
* Shopping cart == One
partition (Row Level
Isolation)
* Each item a new column
Shopping cart data
modelCREATE TABLE user (
username varchar,
firstname varchar,
lastname varchar,
shopping_carts set<varchar>,
PRIMARY KEY (username)
);
CREATE TABLE shopping_cart (
username varchar,
cart_name text
item_id int,
item_name varchar,
description varchar,
price float,
item_detail map<varchar,varchar>
PRIMARY KEY ((username,cart_name),item_id)
);
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport
training watch',349.99,
{'Related':'Timex sports watch',
'Volume Discount':'10'});
INSERT INTO shopping_cart
(username,cart_name,item_id,item_name,description,price,item_detail)
VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart
foot pod',64.00
{'Related':'Timex foot pod',
'Volume Discount':'25'});
One partition (storage row) of data
Item details. Flexible for any use.
Partition row key for one user‟s cart
Creates partition row key
Watching users, making decisions. Freaky, but cool.
User Activity Tracking
User activity use case
* React to user input in real time
* Support for multiple application pods
* Scale for speed
* Losing interactions is costly
* Waiting for batch(hadoop) is to long
The
bad
User activity data model
* Interaction points stored
per user in short table
* Long term interaction
stored in similar table with
date partition
* Process long term later
using batch
* Reverse time series to
get last N items
User activity data model
CREATE TABLE user_activity (
username varchar,
interaction_time timeuuid,
activity_code varchar,
detail varchar,
PRIMARY KEY (username, interaction_time)
) WITH CLUSTERING ORDER BY (interaction_time
DESC);
CREATE TABLE user_activity_history (
username varchar,
interaction_date varchar,
interaction_time timeuuid,
activity_code varchar,
detail varchar,
PRIMARY KEY ((username,interaction_date),interaction_time)
);
INSERT INTO user_activity
(username,interaction_time,activity_code,detail)
VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal
login')
USING TTL 2592000;
INSERT INTO user_activity_history
(username,interaction_date,interaction_time,activity_code,detail)
VALUES ('pmcfadin','20130605',0D1454E0-D202-11E2-8B8B-
0800200C9A66,'100','Normal login');
Reverse order based on timestamp
Expire after 30 days
Data model usage
username | interaction_time | detail | activity_code
----------+--------------------------------------+------------------------------------------+------------------
pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301
pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202
pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205
pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201
pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100
select * from user_activity limit 5;
Maybe put a sale item for flowers
too?
Machines generate logs at a furious pace. Be ready.
Log collection/aggregation
Log collection use case
* Collect log data at high speed
* Cassandra near where logs are generated. Multi-datacenter
* Dice data for various uses. Dashboard. Lookup. Etc.
* The scale needed for RDBMS is cost
prohibitive
* Batch analysis of logs too late for some use
cases
The
bad
Log collection data
model
* Collect and fan out data to
various tables
* Tables for lookup based on
source and time
* Tables for dashboard with
aggregation and summation
Log collection data
model
CREATE TABLE log_lookup (
source varchar,
date_to_minute varchar,
timestamp timeuuid,
raw_log blob,
PRIMARY KEY ((source,date_to_minute),timestamp)
);
CREATE TABLE login_success (
source varchar,
date_to_minute varchar,
successful_logins counter,
PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
CREATE TABLE login_failure (
source varchar,
date_to_minute varchar,
failed_logins counter,
PRIMARY KEY (source,date_to_minute)
) WITH CLUSTERING ORDER BY (date_to_minute DESC);
Consider storing raw logs as
GZIP
Log dashboard
SELECT date_to_minute,successful_logins
FROM login_success
LIMIT 20;
SELECT date_to_minute,failed_logins
FROM login_failure
LIMIT 20;
Because mistaks mistakes happen
User Form Versioning
Form versioning use
case
* Store every possible version efficiently
* Scale to any number of users
* Commit/Rollback functionality on a form
* In RDBMS, many relations that need complicated
join
* Needs to be in cloud and local data center
The
bad
Form version data
model
* Each user has a form
* Each form needs
versioning
* Separate table to store live
version
* Exclusive lock on a form
Form version data
model
CREATE TABLE working_version (
username varchar,
form_id int,
version_number int,
locked_by varchar,
form_attributes map<varchar,varchar>
PRIMARY KEY ((username, form_id), version_number)
) WITH CLUSTERING ORDER BY (version_number DESC);
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,1,'',
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<radio>':'Y,N'});
UPDATE working_version
SET locked_by = 'pmcfadin'
WHERE username = 'pmcfadin'
AND form_id = 1138
AND version_number = 1;
INSERT INTO working_version
(username, form_id, version_number, locked_by, form_attributes)
VALUES ('pmcfadin',1138,2,null,
{'FirstName<text>':'First Name: ',
'LastName<text>':'Last Name: ',
'EmailAddress<text>':'Email Address: ',
'Newsletter<checkbox>':'Y'});
1. Insert first
version
2. Lock for one user
3. Insert new version. Release lock
Light Weight Transactions
The race is on
Process 1 Process 2
SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin'; SELECT firstName, lastName
FROM users
WHERE username = 'pmcfadin';
(0 rows)
(0 rows)
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00'); INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00');
T0
T1
T2
T3
Got nothing! Good to go!
This one wins
The Paxos Consensus Protocol
Quorum-based
Algorithm
©2013 DataStax Confidential. Do not distribute without consent. 87
Extending Paxos
Add third phase to reset
state for subsequent
proposals
©2013 DataStax Confidential. Do not distribute without consent. 88
Compare & Set
Read the current value
of the row to see if it
matches the expected
one
©2013 DataStax Confidential. Do not distribute without consent. 89
Solution LWT
Process 1
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Patrick','McFadin',
['patrick@datastax.com'],
'ba27e03fd95e507daf2937c937d499ab',
'2011-06-20 13:50:00')
IF NOT EXISTS;
T0
T1
[applied]
-----------
True
•Check performed for record
•Paxos ensures exclusive access
•applied = true: Success
Solution LWT
Process 2
T2
T3
[applied] | username | created_date | firstname | lastname
-----------+----------+--------------------------+-----------+----------
False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin
INSERT INTO users (username, firstname,
lastname, email, password, created_date)
VALUES ('pmcfadin','Paul','McFadin',
['paul@oracle.com'],
'ea24e13ad95a209ded8912e937d499de',
'2011-06-20 13:51:00')
IF NOT EXISTS;
•applied = false: Rejected
•No record stomping!
Another LWT Example
Prevents a password from being reset twice using the same reset token.
©2013 DataStax Confidential. Do not distribute without consent. 92
UPDATE users
SET reset_token = null, password = ‘newpassword’
WHERE login = ‘jbellis’
IF reset_token = ‘some-generated-reset-token’
LWT Fine Print
•Light Weight Transactions solve edge conditions
•They have latency cost.
•Be aware
•Load test
•Consider in your data model
Indexing with Tables
• Indexing expresses application intent
• Fast access to specific queries
• Secondary indexes != relational indexes
• Use information you have. No pre-reads.
94
Goals:
1. Create row key for speed
2. Use wide rows for efficiency
Keyword index
• Use a word as a key
• Columns are the occurrence
• Ex: Index of tag words about videos
95
CREATE TABLE tag_index (
tag varchar,
videoid uuid,
timestamp timestamp,
PRIMARY KEY (tag, videoid)
);
VideoId1 .. VideoIdNtag
Fast
Efficie
nt
Partial word index
• Where row size will be large
• Take one part for key, rest for columns name
96
CREATE TABLE email_index (
domain varchar,
user varchar,
username varchar,
PRIMARY KEY (domain, user)
);
INSERT INTO email_index (domain, user, username)
VALUES ('@relational.com','tcodd', 'tcodd');
User: tcodd Email:
tcodd@relational.com
Partial word index
• Create partitions + partial indexes
97
CREATE TABLE product_index (
store int,
part_number0_3 int,
part_number4_9 int,
count int,
PRIMARY KEY ((store,part_number0_3), part_number4_9)
);
INSERT INTO product_index (store,part_number0_3,part_number4_9,count)
VALUES (8675309,7079,48575,3);
SELECT count
FROM product_index
WHERE store = 8675309
AND part_number0_3 = 7079
AND part_number4_9 = 48575;
Compound row
key!
Fast and
efficient!
• Store #8675309 has 3 of part# 7079748575
Bit map index – supports ad hoc queries
• Multiple parts to a key
• Create a truth table of the different combinations
• Inserts == the number of combinations
- 3 fields? 7 options (Not going to use null choice)
- 4 fields? 15 options
- 2n – 1, where n = # of dynamic query fields
98
Bit map index
• Find a car in a lot by variable combinations
99
Make Model Color Combination
x Color
x Model
x x Model+Color
x Make
x x Make+Color
x x Make+Model
x x x Make+Model+Color
Bit map index - Table create
• Make a table with three different key combos
10
0
CREATE TABLE car_location_index (
make varchar,
model varchar,
color varchar,
vehical_id int,
lot_id int,
PRIMARY KEY ((make,model,color),vehical_id)
);
Compound row key with three different options
Bit map index - Adding records
• Pre-optimize for 7 possible questions on insert
10
1
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('Ford','','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','Blue',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','Mustang','',1234,8675309);
INSERT INTO car_location_index (make,model,color,vehical_id,lot_id)
VALUES ('','','Blue',1234,8675309);
Bit map index - Selecting records
• Different combinations now possible
10
2
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = 'Ford'
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
SELECT vehical_id,lot_id
FROM car_location_index
WHERE make = ''
AND model = ''
AND color = 'Blue';
vehical_id | lot_id
------------+---------
1234 | 8675309
8765 | 5551212
Data Modeling Challenge!
©2013 DataStax Confidential. Do not distribute without consent. 103
Requirements
• Find orders by customer, supplier, product, and employee
• Optional date range
• Need to display order information – header/details
• List all employees for a manager
• Orders that have not shipped
• By shipper
• By date
©2013 DataStax Confidential. Do not distribute without consent. 104

More Related Content

What's hot

Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011mubarakss
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityColin Charles
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityColin Charles
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandraDuyhai Doan
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data ModellingKnoldus Inc.
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraJohnny Miller
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraJim Hatcher
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 

What's hot (20)

Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
MariaDB and Cassandra Interoperability
MariaDB and Cassandra InteroperabilityMariaDB and Cassandra Interoperability
MariaDB and Cassandra Interoperability
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
MariaDB Cassandra Interoperability
MariaDB Cassandra InteroperabilityMariaDB Cassandra Interoperability
MariaDB Cassandra Interoperability
 
CQL3 in depth
CQL3 in depthCQL3 in depth
CQL3 in depth
 
Datastax day 2016 introduction to apache cassandra
Datastax day 2016   introduction to apache cassandraDatastax day 2016   introduction to apache cassandra
Datastax day 2016 introduction to apache cassandra
 
Cassandra Data Modelling
Cassandra Data ModellingCassandra Data Modelling
Cassandra Data Modelling
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 

Viewers also liked

Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Symantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in actionSymantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in actionDataStax Academy
 
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)datastaxjp
 
How We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemHow We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemMongoDB
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckDataStax Academy
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real WorldJeremy Hanna
 
NGCC 2016 - Support large partitions
NGCC 2016 - Support large partitionsNGCC 2016 - Support large partitions
NGCC 2016 - Support large partitionsRobert Stupp
 
Key-Value Stores: a practical overview
Key-Value Stores: a practical overviewKey-Value Stores: a practical overview
Key-Value Stores: a practical overviewMarc Seeger
 
Cassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelCassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelDataStax
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...DataStax
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseDataStax Academy
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3Eric Evans
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 

Viewers also liked (20)

Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
 
Cassandra in e-commerce
Cassandra in e-commerceCassandra in e-commerce
Cassandra in e-commerce
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Symantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in actionSymantec: Cassandra Data Modelling techniques in action
Symantec: Cassandra Data Modelling techniques in action
 
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
[Cassandra summit Tokyo, 2015] Cassandra 2015 最新情報 by ジョナサン・エリス(Jonathan Ellis)
 
How We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising SystemHow We Use MongoDB in Our Advertising System
How We Use MongoDB in Our Advertising System
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
 
NGCC 2016 - Support large partitions
NGCC 2016 - Support large partitionsNGCC 2016 - Support large partitions
NGCC 2016 - Support large partitions
 
Key-Value Stores: a practical overview
Key-Value Stores: a practical overviewKey-Value Stores: a practical overview
Key-Value Stores: a practical overview
 
Cassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data ModelCassandra Community Webinar | The World's Next Top Data Model
Cassandra Community Webinar | The World's Next Top Data Model
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 
Cassandra - lesson learned
Cassandra  - lesson learnedCassandra  - lesson learned
Cassandra - lesson learned
 
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax EnterpriseSolr & Cassandra: Searching Cassandra with DataStax Enterprise
Solr & Cassandra: Searching Cassandra with DataStax Enterprise
 
Cassandra by Example: Data Modelling with CQL3
Cassandra by Example:  Data Modelling with CQL3Cassandra by Example:  Data Modelling with CQL3
Cassandra by Example: Data Modelling with CQL3
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 

Similar to Apache Cassandra Data Modeling with Travis Price

Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select TopicsJay Coskey
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.pptMARasheed3
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraLuke Tillman
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSencha
 
Most useful queries
Most useful queriesMost useful queries
Most useful queriesSam Depp
 
N07_RoundII_20220405.pptx
N07_RoundII_20220405.pptxN07_RoundII_20220405.pptx
N07_RoundII_20220405.pptxNguyễn Thái
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cqlzznate
 
Mysql-overview.pptx
Mysql-overview.pptxMysql-overview.pptx
Mysql-overview.pptxTamilHunt
 
Cassandra Table Modeling - an alternate approach
Cassandra Table Modeling - an alternate approachCassandra Table Modeling - an alternate approach
Cassandra Table Modeling - an alternate approachDevopam Mittra
 

Similar to Apache Cassandra Data Modeling with Travis Price (20)

Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
Dun ddd
Dun dddDun ddd
Dun ddd
 
SQL Server Select Topics
SQL Server Select TopicsSQL Server Select Topics
SQL Server Select Topics
 
dbs class 7.ppt
dbs class 7.pptdbs class 7.ppt
dbs class 7.ppt
 
Rdbms day3
Rdbms day3Rdbms day3
Rdbms day3
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Module 3
Module 3Module 3
Module 3
 
Unit - II.pptx
Unit - II.pptxUnit - II.pptx
Unit - II.pptx
 
Sql
SqlSql
Sql
 
SenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige WhiteSenchaCon 2016: The Once and Future Grid - Nige White
SenchaCon 2016: The Once and Future Grid - Nige White
 
Most useful queries
Most useful queriesMost useful queries
Most useful queries
 
Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
N07_RoundII_20220405.pptx
N07_RoundII_20220405.pptxN07_RoundII_20220405.pptx
N07_RoundII_20220405.pptx
 
Basic sql Commands
Basic sql CommandsBasic sql Commands
Basic sql Commands
 
PT- Oracle session01
PT- Oracle session01 PT- Oracle session01
PT- Oracle session01
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
 
Mysql-overview.pptx
Mysql-overview.pptxMysql-overview.pptx
Mysql-overview.pptx
 
Cassandra Table Modeling - an alternate approach
Cassandra Table Modeling - an alternate approachCassandra Table Modeling - an alternate approach
Cassandra Table Modeling - an alternate approach
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Apache Cassandra Data Modeling with Travis Price

  • 1. C* Data Modeling Travis Price, Solution Architect
  • 2. Agenda • C* Schema Overview • Columns and their components • Column Families • Keyspaces • Designing a Data Model • Partitioning • Indexing • Keys • Example Models • Shopping Cart • User Activity • Logging • Form Versioning • Advanced Modeling • LWT • Indexing by Table
  • 3. Cassandra Architecture Review ©2013 DataStax Confidential. Do not distribute without consent. 3 Write Read Write Write Read Cassandra Multi-Data Center
  • 4. Terms ©2013 DataStax Confidential. Do not distribute without consent. 4 Cluster Datacenter Node Keyspace Column Family Partition Storage row vs. CQL row Snitch Replication Factor Consistency
  • 5. System and hardware failures can and do happen Peer-to-peer, distributed system All nodes identical – Gossip status and state Data partitioned among nodes Data replication ensures fault tolerance Read/write anywhere design Cassandra Architecture 80 50 30 1070 60 40 20
  • 6. Schema is a row-oriented, column structure A keyspace is akin to a database Tables are more flexible/dynamic A row in a table is indexed by its key Other columns may be indexed as well Keyspaces and Tables 80 50 30 1070 60 40 20 Key Name SSN DOB Keyspace Table
  • 7. Writing Data 80 50 30 1070 60 40 20 Data is partitioned by token Nodes own token and range back to previous token Nodes may own multiple tokens (rf, vnodes) Client connects to any node Node takes on role of coordinator during transaction hash(key) => token, rd/wr node(s) Cluster owns all data – typically, each DC does, too Nodes own slices of data - one, or more Replicas can be rack-aware Client Write - key hash(key)=43
  • 8. Replication and DCs 80 50 30 1070 60 40 20 Client Write - key hash(key)=43 81 51 31 1171 61 41 21 Data Center 1; rf=3 Data Center 2; rf=3
  • 9. Choose between strong and eventual consistency (one to all nodes acknowledging) depending upon the need Can be configured on a per-operation basis, and for both reads and writes Handles multi-data center transparently CLwr + CLrd > RF yields consistency Tunable Consistency 80 50 30 1070 60 40 20
  • 10. Familiar SQL syntax without joins, group by, etc Create objects via DDL (e.g. CREATE...) Core DML commands supported (e.g. CREATE, UPDATE, DELETE) Query data with SELECT CQL Language 80 50 30 1070 60 40 20
  • 11. B l o o m F i l t e r s K e y C a c h e R o w C a c h e O S C a c h e Commit Log (1/node) memtable (1/cf) key; col1=val1,..coln=valn key; col1=val1,..coln=valn ... key; col1=val1,..coln=valn memtable (1/cf) key; col1=val1,..coln=valn key; col1=val1,..coln=valn ... key; col1=val1,..coln=valn ... memtable (1/cf) key; col1=val1,..coln=valn key; col1=val1,..coln=valn ... key; col1=val1,..coln=valn ... sstable (n/cf) ... s m x key; col1=val1,..coln=valncdx r d x key; col1=val1,..coln=valncdx key; col1=val1,..coln=valncdx sstable (n/cf) ... s m x key; col1=val1,..coln=valncdx r d x key; col1=val1,..coln=valncdx key; col1=val1,..coln=valncdx sstable (n/cf) ... s m x key; col1=val1,..coln=valncdx r d x key; col1=val1,..coln=valncdx key; col1=val1,..coln=valncdx Read Path M e m o r y Read / Write Path Write Path D i s k H e a p
  • 12. The Cassandra Schema Consists of: • Column • Column Family • Keyspace • Cluster
  • 13. High Level Overview Keyspace Column Family /Table Rows Columns
  • 14. Components of the Column The column is the fundamental data type in Cassandra and includes: • Column name • Column value • Timestamp • TTL (Optional)
  • 15. The Column Name Value Timestamp (Name: “firstName”, Value: “Travis”, Timestamp: 1363106500)
  • 16. Column Name • Can be any value • Can be any type • Not optional • Must be unique • Stored with every value
  • 17. Column Value • Any value • Any type • Can be empty – but is required
  • 18. Column Names and Values • the data type for a column value is called a validator. • The data type for a column name is called a comparator. • Cassandra validates that data type of the keys of rows. • Columns are sorted, and stored in sorted order on disk, so you have to specify a comparator for columns. This can be reversed… more on this later
  • 20. Column TimeStamp • 64-bit integer • Best Practice – Should be created in a consistent manner by all your clients • Required
  • 21. Column TTL • Defined on INSERT • Positive delay (in seconds) • After time expires it is marked for deletion
  • 22. Special Types of Columns • Counter • Collections
  • 23. Counters • Allows for addition / subtraction • 64-bit value • No timestamp update recommendation_summary set num_products = num_products + 1 where recommendation = 'highly recommend';
  • 24. Cassandra feature - Collections • Collections give you three types: - Set - List - Map • Each allow for dynamic updates • Fully supported in CQL 3 • Requires serialization 24 CREATE TABLE collections_example ( id int PRIMARY KEY, set_example set<text>, list_example list<text>, map_example map<int,text> );
  • 25. Cassandra Collections - Set • Set is sorted by CQL type comparator 25 INSERT INTO collections_example (id, set_example) VALUES(1, {'1-one', '2-two'}); set_example set<text> Collection name Collection type CQL Type
  • 26. Cassandra Collections - Set Operations 26 UPDATE collections_example SET set_example = set_example + {'3-three'} WHERE id = 1; UPDATE collections_example SET set_example = set_example + {'0-zero'} WHERE id = 1; UPDATE collections_example SET set_example = set_example - {'3-three'} WHERE id = 1; • Adding an element to the set • After adding this element, it will sort to the beginning. • Removing an element from the set
  • 27. Cassandra Collections - List • Ordered by insertion 27 list_example list<text> Collection name Collection type CQL Type INSERT INTO collections_example (id, list_example) VALUES(1, ['1-one', '2-two']);
  • 28. Cassandra Collections - List Operations • Adding an element to the end of a list 28 UPDATE collections_example SET list_example = list_example + ['3-three'] WHERE id = 1; UPDATE collections_example SET list_example = ['0-zero'] + list_example WHERE id = 1; • Adding an element to the beginning of a list UPDATE collections_example SET list_example = list_example - ['3-three'] WHERE id = 1; • Deleting an element from a list
  • 29. Cassandra Collections - Map • Key and value • Key is sorted by CQL type comparator 29 INSERT INTO collections_example (id, map_example) VALUES(1, { 1 : 'one', 2 : 'two' }); map_example map<int,text> Collection name Collection type Value CQL Type Key CQL Type
  • 30. Cassandra Collections - Map Operations • Add an element to the map 30 UPDATE collections_example SET map_example[3] = 'three' WHERE id = 1; UPDATE collections_example SET map_example[3] = 'tres' WHERE id = 1; DELETE map_example[3] FROM collections_example WHERE id = 1; • Update an existing element in the map • Delete an element in the map
  • 31. User model • Enhance a simple user table • Great for static + some dynamic • Takes advantage of row level isolation 31 CREATE TABLE user_with_location ( username text PRIMARY KEY, first_name text, last_name text, address1 text, city text, postal_code text, last_login timestamp, location_by_date map<timeuuid,text> );
  • 32. User profile - Operations • Adding new login locations to the map 32 UPDATE user_with_location SET last_login = now(), location_by_date = {now() : '123.123.123.1'} WHERE username='PatrickMcFadin'; UPDATE user_with_location USING TTL 2592000 // 30 Days SET last_login = now(), location_by_date = {now() : '123.123.123.1'} WHERE username='PatrickMcFadin'; • Adding new login locations to the map + TTL!
  • 33. The Cassandra Schema Consists of: • Column • Column Family • Keyspace • Cluster
  • 34. Column Families / Tables • Similar to tables • Groupings of Rows • Tunable Consistency • De-Normalization • To avoid I/O • Simplify the Read Path • Static or Dynamic
  • 35. Static Column Families • Are the most similar to a relational table • Most rows have the same column names • Columns in rows can be different jbellis Name Email Address State Jonathan jb@ds.com 123 main TX dhutch Name Email Address State Daria dh@ds.com 45 2nd St. CA egilmore Name Email eric eg@ds.com Row Key Columns
  • 36. Dynamic Column Families • Also called “wide rows” • Structured so a query into the row will answer a question jbellis dhutch egilmore datastax mzcassie dhutch egilmore egilmore datastax mzcassie Row Key Columns
  • 38. Clustering Order (Comparators) • Sorts columns on disk by default • Can change the order
  • 39. The Cassandra Schema Consists of: • Column • Column Family • Keyspace • Cluster
  • 40. Keyspaces • Are groupings of Column Families • Replication strategies • Replication factor CREATE KEYSPACE demodb WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3}; ALTER KEYSPACE dsg WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3};
  • 41. Replication Strategies Two replication strategies are available: • SimpleStrategy: Use for a single data center only. If you ever intend more than one data center, use the NetworkTopologyStrategy. • NetworkTopologyStrategy: Highly recommended for most deployments because it is much easier to expand to multiple data centers when required by future expansion. ©2013 DataStax Confidential. Do not distribute without consent. 41
  • 42. Snitches • A snitch determines which data centers and racks are written to and read from. • Snitches inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into data centers and racks. All nodes must have exactly the same snitch configuration. Cassandra does its best not to have more than one replica on the same rack (which is not necessarily a physical location). ©2013 DataStax Confidential. Do not distribute without consent. 42
  • 43. Snitch Types SimpleSnitch • single-data center deployments (or single-zone in public clouds) RackInferringSnitch • determines the location of nodes by rack and data center, which are assumed to correspond to the 3rd and 2nd octet of the node's IP address PropertyFileSnitch • user-defined description of the network details • cassandra-topology.properties file GossipingPropertyFileSnitch • defines a local node's data center and rack • uses gossip for propagating this information to other nodes • conf/cassandra-rackdc.properties Amazon Snitches • EC2Snitch • EC2MultiRegionSnitch ©2013 DataStax Confidential. Do not distribute without consent. 43
  • 45. Partitioners Generates the token for a key, determining which node owns the record Cassandra offers the following partitioners: • Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash hash values. • RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values. • ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes ©2013 DataStax Confidential. Do not distribute without consent. 45
  • 46. Partitioners – paging through all rows • SELECT * FROM test WHERE token(k) > token(42); CQL 3 forbids such a query unless the partitioner in use is ordered. Even when using the random partitioner or the murmur3 partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL 3 includes the token function: The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner distribute tokens in a completely unordered manner. The token function makes it possible to page through unordered partitioner results. Using the token function actually queries results directly using tokens. Underneath, the token function makes token- based comparisons and does not convert keys to tokens (not k > 42).
  • 47. Primary Index Overview • Index for all of your row keys • Per-node index • Partitioner + placement manages which node • Keys are just kept in ordered buckets • Partitioner determines how K  Token
  • 48. Natural Keys Examples: • An email address • A user id • Easy to make the relationship • Less de-normalization • More risk of an „UPSERT‟ • Changing the key requires a bulk copy operation
  • 49. Surrogate Keys • Example: • UUID • Independently generated • Allows you to store multiple versions of a user • Relationship is now indirect • Changing the key requires the creation of a new row, or column
  • 51. Key Structure CREATE TABLE playlists ( id uuid, song_id uuid, title text, album text, artist text, PRIMARY KEY (id, song_id) ); Partition (row) Key Clustering Key
  • 52. Sorting • It‟s Free! • ONLY on the second column in compound Primary Key
  • 53. Composite Partition Keys storeid and eventid are used to create the token ©2013 DataStax Confidential. Do not distribute without consent. 53 CREATE TABLE pricechange( storeid int, eventid int, sku text, oldprice double, newprice double, PRIMARY KEY ((storeid, eventid), sku) );
  • 54. Secondary Indexes • Need for an easy way to do limited ad-hoc queries • Supports multiple per row • Single clause can support multiple selectors • Implemented as a hash map, not B-Tree • Low cardinality ONLY
  • 57. ALLOW FILTERING Valid Queries: ©2013 DataStax Confidential. Do not distribute without consent. 57 CREATE TABLE users ( username text PRIMARY KEY, firstname text, lastname text, birth_year int, country text ) CREATE INDEX ON users(birth_year); SELECT * FROM users; SELECT firstname, lastname FROM users WHERE birth_year = 1981; Invalid query: SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR'; SELECT firstname, lastname FROM users WHERE birth_year = 1981 AND country = 'FR’ ALLOW FILTERING; Cassandra cannot guarantee that it won‟t have to scan large amount of data even if the result to those query is small. Typically, it will scan all the index entries for users born in 1981 even if only a handful are actually from France.
  • 59. The Basics of C* Modeling • Work backwards • What does your application do? • What are the access patterns? • Now design your data model
  • 60. Procedures Consider use case requirements • What data? • Ordering? • Filtering? • Grouping? • Events in chronological order? • Does the data expire?
  • 61. De-Normalization • De-Normalize in C* • Forget third normal form • Data Duplication is OK • Concerns • Resource contention • Latency • Client-side joins • Avoid them in your C* code
  • 62. Foreign Keys • There are no foreign keys • No server-side joins
  • 63. Table Design • Ideally each query will be one row • Compared to other resources, disk space is cheap • Reduce disk seeks • Reduce network traffic
  • 64. Workload Preference • High level of de-normalization means you may have to write the same data many times • Cassandra handles large numbers of writes well • If given the choice: • Read once • Write many
  • 65. Concurrent Writes • A row is always referenced by a Key • Keys are just bytes • They must be unique within a CF • Primary keys are unique • But Cassandra will not enforce uniqueness • If you are not careful you will accidentally [UPSERT] the whole thing
  • 66. The 5 C* Commandments for Developers 1. Start with queries. Don‟t data model for data modeling sake. 2. It‟s ok to duplicate data. 3. C* is designed to read and write sequentially. Great for rotational disk, awesome for SSDs, awful for NAS. If your disk has an Ethernet port, it‟s not good for C*. 4. Use secondary indexes strategically and cautiously. 5. Embrace wide rows and de-normalization
  • 67. Online stores require 100% uptime Shopping Cart Data Model
  • 68. Shopping cart use case * Store shopping cart data reliably * Minimize (or eliminate) downtime. Multi-dc * Scale for the “Cyber Monday” problem * Every minute off-line is lost $$ * Online shoppers want speed! The bad
  • 69. Shopping cart data model * Each customer can have one or more shopping carts * De-normalize data for fast access * Shopping cart == One partition (Row Level Isolation) * Each item a new column
  • 70. Shopping cart data modelCREATE TABLE user ( username varchar, firstname varchar, lastname varchar, shopping_carts set<varchar>, PRIMARY KEY (username) ); CREATE TABLE shopping_cart ( username varchar, cart_name text item_id int, item_name varchar, description varchar, price float, item_detail map<varchar,varchar> PRIMARY KEY ((username,cart_name),item_id) ); INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail) VALUES ('pmcfadin','Gadgets I want',8675309,'Garmin 910XT','Multisport training watch',349.99, {'Related':'Timex sports watch', 'Volume Discount':'10'}); INSERT INTO shopping_cart (username,cart_name,item_id,item_name,description,price,item_detail) VALUES ('pmcfadin','Gadgets I want',9748575,'Polaris Foot Pod','Bluetooth Smart foot pod',64.00 {'Related':'Timex foot pod', 'Volume Discount':'25'}); One partition (storage row) of data Item details. Flexible for any use. Partition row key for one user‟s cart Creates partition row key
  • 71. Watching users, making decisions. Freaky, but cool. User Activity Tracking
  • 72. User activity use case * React to user input in real time * Support for multiple application pods * Scale for speed * Losing interactions is costly * Waiting for batch(hadoop) is to long The bad
  • 73. User activity data model * Interaction points stored per user in short table * Long term interaction stored in similar table with date partition * Process long term later using batch * Reverse time series to get last N items
  • 74. User activity data model CREATE TABLE user_activity ( username varchar, interaction_time timeuuid, activity_code varchar, detail varchar, PRIMARY KEY (username, interaction_time) ) WITH CLUSTERING ORDER BY (interaction_time DESC); CREATE TABLE user_activity_history ( username varchar, interaction_date varchar, interaction_time timeuuid, activity_code varchar, detail varchar, PRIMARY KEY ((username,interaction_date),interaction_time) ); INSERT INTO user_activity (username,interaction_time,activity_code,detail) VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login') USING TTL 2592000; INSERT INTO user_activity_history (username,interaction_date,interaction_time,activity_code,detail) VALUES ('pmcfadin','20130605',0D1454E0-D202-11E2-8B8B- 0800200C9A66,'100','Normal login'); Reverse order based on timestamp Expire after 30 days
  • 75. Data model usage username | interaction_time | detail | activity_code ----------+--------------------------------------+------------------------------------------+------------------ pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301 pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202 pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205 pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201 pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100 select * from user_activity limit 5; Maybe put a sale item for flowers too?
  • 76. Machines generate logs at a furious pace. Be ready. Log collection/aggregation
  • 77. Log collection use case * Collect log data at high speed * Cassandra near where logs are generated. Multi-datacenter * Dice data for various uses. Dashboard. Lookup. Etc. * The scale needed for RDBMS is cost prohibitive * Batch analysis of logs too late for some use cases The bad
  • 78. Log collection data model * Collect and fan out data to various tables * Tables for lookup based on source and time * Tables for dashboard with aggregation and summation
  • 79. Log collection data model CREATE TABLE log_lookup ( source varchar, date_to_minute varchar, timestamp timeuuid, raw_log blob, PRIMARY KEY ((source,date_to_minute),timestamp) ); CREATE TABLE login_success ( source varchar, date_to_minute varchar, successful_logins counter, PRIMARY KEY (source,date_to_minute) ) WITH CLUSTERING ORDER BY (date_to_minute DESC); CREATE TABLE login_failure ( source varchar, date_to_minute varchar, failed_logins counter, PRIMARY KEY (source,date_to_minute) ) WITH CLUSTERING ORDER BY (date_to_minute DESC); Consider storing raw logs as GZIP
  • 80. Log dashboard SELECT date_to_minute,successful_logins FROM login_success LIMIT 20; SELECT date_to_minute,failed_logins FROM login_failure LIMIT 20;
  • 81. Because mistaks mistakes happen User Form Versioning
  • 82. Form versioning use case * Store every possible version efficiently * Scale to any number of users * Commit/Rollback functionality on a form * In RDBMS, many relations that need complicated join * Needs to be in cloud and local data center The bad
  • 83. Form version data model * Each user has a form * Each form needs versioning * Separate table to store live version * Exclusive lock on a form
  • 84. Form version data model CREATE TABLE working_version ( username varchar, form_id int, version_number int, locked_by varchar, form_attributes map<varchar,varchar> PRIMARY KEY ((username, form_id), version_number) ) WITH CLUSTERING ORDER BY (version_number DESC); INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,1,'', {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<radio>':'Y,N'}); UPDATE working_version SET locked_by = 'pmcfadin' WHERE username = 'pmcfadin' AND form_id = 1138 AND version_number = 1; INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes) VALUES ('pmcfadin',1138,2,null, {'FirstName<text>':'First Name: ', 'LastName<text>':'Last Name: ', 'EmailAddress<text>':'Email Address: ', 'Newsletter<checkbox>':'Y'}); 1. Insert first version 2. Lock for one user 3. Insert new version. Release lock
  • 86. The race is on Process 1 Process 2 SELECT firstName, lastName FROM users WHERE username = 'pmcfadin'; SELECT firstName, lastName FROM users WHERE username = 'pmcfadin'; (0 rows) (0 rows) INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Patrick','McFadin', ['patrick@datastax.com'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00'); INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Paul','McFadin', ['paul@oracle.com'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00'); T0 T1 T2 T3 Got nothing! Good to go! This one wins
  • 87. The Paxos Consensus Protocol Quorum-based Algorithm ©2013 DataStax Confidential. Do not distribute without consent. 87
  • 88. Extending Paxos Add third phase to reset state for subsequent proposals ©2013 DataStax Confidential. Do not distribute without consent. 88
  • 89. Compare & Set Read the current value of the row to see if it matches the expected one ©2013 DataStax Confidential. Do not distribute without consent. 89
  • 90. Solution LWT Process 1 INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Patrick','McFadin', ['patrick@datastax.com'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00') IF NOT EXISTS; T0 T1 [applied] ----------- True •Check performed for record •Paxos ensures exclusive access •applied = true: Success
  • 91. Solution LWT Process 2 T2 T3 [applied] | username | created_date | firstname | lastname -----------+----------+--------------------------+-----------+---------- False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin INSERT INTO users (username, firstname, lastname, email, password, created_date) VALUES ('pmcfadin','Paul','McFadin', ['paul@oracle.com'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00') IF NOT EXISTS; •applied = false: Rejected •No record stomping!
  • 92. Another LWT Example Prevents a password from being reset twice using the same reset token. ©2013 DataStax Confidential. Do not distribute without consent. 92 UPDATE users SET reset_token = null, password = ‘newpassword’ WHERE login = ‘jbellis’ IF reset_token = ‘some-generated-reset-token’
  • 93. LWT Fine Print •Light Weight Transactions solve edge conditions •They have latency cost. •Be aware •Load test •Consider in your data model
  • 94. Indexing with Tables • Indexing expresses application intent • Fast access to specific queries • Secondary indexes != relational indexes • Use information you have. No pre-reads. 94 Goals: 1. Create row key for speed 2. Use wide rows for efficiency
  • 95. Keyword index • Use a word as a key • Columns are the occurrence • Ex: Index of tag words about videos 95 CREATE TABLE tag_index ( tag varchar, videoid uuid, timestamp timestamp, PRIMARY KEY (tag, videoid) ); VideoId1 .. VideoIdNtag Fast Efficie nt
  • 96. Partial word index • Where row size will be large • Take one part for key, rest for columns name 96 CREATE TABLE email_index ( domain varchar, user varchar, username varchar, PRIMARY KEY (domain, user) ); INSERT INTO email_index (domain, user, username) VALUES ('@relational.com','tcodd', 'tcodd'); User: tcodd Email: tcodd@relational.com
  • 97. Partial word index • Create partitions + partial indexes 97 CREATE TABLE product_index ( store int, part_number0_3 int, part_number4_9 int, count int, PRIMARY KEY ((store,part_number0_3), part_number4_9) ); INSERT INTO product_index (store,part_number0_3,part_number4_9,count) VALUES (8675309,7079,48575,3); SELECT count FROM product_index WHERE store = 8675309 AND part_number0_3 = 7079 AND part_number4_9 = 48575; Compound row key! Fast and efficient! • Store #8675309 has 3 of part# 7079748575
  • 98. Bit map index – supports ad hoc queries • Multiple parts to a key • Create a truth table of the different combinations • Inserts == the number of combinations - 3 fields? 7 options (Not going to use null choice) - 4 fields? 15 options - 2n – 1, where n = # of dynamic query fields 98
  • 99. Bit map index • Find a car in a lot by variable combinations 99 Make Model Color Combination x Color x Model x x Model+Color x Make x x Make+Color x x Make+Model x x x Make+Model+Color
  • 100. Bit map index - Table create • Make a table with three different key combos 10 0 CREATE TABLE car_location_index ( make varchar, model varchar, color varchar, vehical_id int, lot_id int, PRIMARY KEY ((make,model,color),vehical_id) ); Compound row key with three different options
  • 101. Bit map index - Adding records • Pre-optimize for 7 possible questions on insert 10 1 INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('Ford','Mustang','Blue',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('Ford','Mustang','',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('Ford','','Blue',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('Ford','','',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('','Mustang','Blue',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('','Mustang','',1234,8675309); INSERT INTO car_location_index (make,model,color,vehical_id,lot_id) VALUES ('','','Blue',1234,8675309);
  • 102. Bit map index - Selecting records • Different combinations now possible 10 2 SELECT vehical_id,lot_id FROM car_location_index WHERE make = 'Ford' AND model = '' AND color = 'Blue'; vehical_id | lot_id ------------+--------- 1234 | 8675309 SELECT vehical_id,lot_id FROM car_location_index WHERE make = '' AND model = '' AND color = 'Blue'; vehical_id | lot_id ------------+--------- 1234 | 8675309 8765 | 5551212
  • 103. Data Modeling Challenge! ©2013 DataStax Confidential. Do not distribute without consent. 103
  • 104. Requirements • Find orders by customer, supplier, product, and employee • Optional date range • Need to display order information – header/details • List all employees for a manager • Orders that have not shipped • By shipper • By date ©2013 DataStax Confidential. Do not distribute without consent. 104

Editor's Notes

  1. You can define the validator and comparator when you create your table schema (which is recommended), but Cassandra does not require it. Internally, Cassandra stores column names and values as hex byte arrays (BytesType). This is the default client encoding used if data types are not defined in the table schema (or if not specified by the client request).
  2. Cassandra comes with the following built-in data types, which can be used as both validators (row key and column value data types) or comparators (column name data types). One exception is CounterColumnType, which is only allowed as a column value (not allowed for row keys or column names).
  3. SELECT WRITETIME (title) FROM songs WHERE id = 8a172618-b121-4136-bb10-f665cfc469eb; writetime(title) ------------------ 1353890782373000
  4. A counter is a special kind of column used to store a number that incrementally counts the occurrences of a particular event or process. For example, you might use a counter column to count the number of times a page is viewed. Counter column tables must use Counter data type. Counters may only be stored in dedicated tables. After a counter is defined, the client application then updates the counter column value by incrementing (or decrementing) it. A client update to a counter column passes the name of the counter and the increment (or decrement) value; no timestamp is required. Internally, the structure of a counter column is a bit more complex. Cassandra tracks the distributed state of the counter as well as a server-generated timestamp upon deletion of a counter column. For this reason, it is important that all nodes in your cluster have their clocks synchronized using a source such as network time protocol (NTP). Unlike normal columns, a write to a counter requires a read in the background to ensure that distributed counter values remain consistent across replicas. Typically, you use a consistency level of ONE with counters because during a write operation, the implicit read does not impact write latency.
  5. Using clustering order You can order query results to make use of the on-disk sorting of columns. You can order results in ascending or descending order. The ascending order will be more efficient than descending. If you need results in descending order, you can specify a clustering order to store columns on disk in the reverse order of the default. Descending queries will then be faster than ascending ones. The following example shows a table definition that changes the clustering order to descending by insertion time. create table timeseries ( event_type text, insertion_time timestamp, event blob, PRIMARY KEY (event_type, insertion_time) ) WITH CLUSTERING ORDER BY (insertion_time DESC);
  6. CREATE KEYSPACE demodb WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3};ALTER KEYSPACE dsg WITH REPLICATION = {‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : 3};
  7. CQL 3 forbids such a query unless the partitioner in use is ordered. Even when using the random partitioner or the murmur3 partitioner, it can sometimes be useful to page through all rows. For this purpose, CQL 3 includes the token function: The ByteOrderedpartitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner distribute tokens in a completely unordered manner. The token function makes it possible to page through unordered partitioner results. Using the token function actually queries results directly using tokens. Underneath, the token function makes token-based comparisons and does not convert keys to tokens (not k &gt; 42).
  8. What is it?Index for all of your row keysA key is mapped to a token, and from that token we can find the data in our ring.Index for all of your row keysPer-node indexPartitioner + placement manages which nodeKeys are just kept in ordered bucketsPartitioner determines how K TokenIn relational database design, a primary key is the unique key used to identify each row in a table. A primary key index, like any index, speeds up random access to data in the table. The primary key also ensures record uniqueness, and may also control the order in which records are physically clustered, or stored by the database.In Cassandra, the primary index for a column family is the index of its row keys. Each node maintains this index for the data it manages.Rows are assigned to nodes by the cluster-configured partitionerand the keyspace-configured replica placement strategy. The primary index in Cassandra allows looking up of rows by their row key. Since each node knows what ranges of keys each node manages, requested rows can be efficiently located by scanning the row indexes only on the relevant replicas.With randomly partitioned row keys (the default in Cassandra), row keys are partitioned by their MD5 hash and cannot be scanned in order like traditional b-tree indexes. Using an ordered partitioner does allow for range queries over rows, but is not recommended because of the difficulty in maintaining even data distribution across nodes. See About Data Partitioning in Cassandra for more information.
  9. Cassandra uses the first column name in the primary key definition as the partition key. For example, in the playlists table, id is the partition key. The remaining column, or columns that are not partition keys in the primary key definition are the clustering columns. In the case of the playlists table, the song_id is the clustering column. The data for each partition is clustered by the remaining column or columns of the primary key definition. On a physical node, when rows for a partition key are stored in order based on the clustering columns, retrieval of rows is very efficient. For example, because the id in the playlists table is the partition key, all the songs for a playlist are clustered in the order of the remaining song_id column. Insertion, update, and deletion operations on rows sharing the same partition key for a table are performed atomically and in isolation.
  10. CREATE TABLE playlists ( id uuid,song_iduuid, title text, album text, artist text, PRIMARY KEY (id, song_id));
  11. ORDER BY clauses can select a single column only. That column has to be the second column in a compound PRIMARY KEY. This also applies to tables with more than two column components in the primary key. Ordering can be done in ascending or descending order, default ascending, and specified with the ASC or DESC keywords. Basically we are saying that you get sorting for free with Cassandra, but it’s not always as straight forward as you would expect.
  12. CREATE TABLE pricechange(storeidint,eventidint,sku text,oldprice double,newprice double, PRIMARY KEY ((storeid, eventid), sku));
  13. Secondary Indexes came about in Cassandra because of a need for an easy way to query data.
  14. An advantage of secondary indexes is the operational ease of populating and maintaining the index. Secondary indexes are built in the background automatically, without blocking reads or writes. Client-maintained &apos;column families as indexes&apos; must be created manually; for example, if the state column had been indexed by creating a column family such as users_by_state, your client application would have to populate the column family with data from the users column family.Using CQL, you can create a secondary index on a column after defining a column family. Using CQL 3, for example, define a static column family, and then create a secondary index on two of its named columns:Secondary index names, in this case, users_state, must be unique within the keyspace. Because of the secondary index created for the state column, after inserting values into the column family, you can query Cassandra directly for users who live in a given state
  15. P 124Cassandra supports these conditional operators: =, &gt;, &gt;=, &lt;, or &lt;=, but not all in certain situations. • A filter based on a non-equals condition on a partition key is supported only if the partitioner is an ordered one. • WHERE clauses can include a greater-than and less-than comparisons, but for a given partition key, the conditions on the clustering column are restricted to the filters that allow Cassandra to select a contiguous ordering of rows. For example: This query constructs a filter that selects data about stewards whose reign started by 2450 and ended before 2500. If king were not a component of the primary key, you would need to create a secondary index on king to use this query: The output is: To allow Cassandra to select a contiguous ordering of rows, you need to include the king component of the primary key in the filter using an equality condition. The ALLOW FILTERING clause, described in the next section, is also required.
  16. Add more details.