SlideShare a Scribd company logo
1 of 37
Download to read offline
Data Modeling with Apache Cassandra™
DuyHai DOAN
Apache Cassandra™ Evangelist, Apache Zeppelin™ committer
1 Data modeling objectives
2 The partition key
3 The clustering column(s)
4 Other critical details
5 Workshop
© DataStax, All Rights Reserved. 2
Data Modeling Objectives
Data modeling objectives
1) Get your data out of Cassandra
2) Reduce query latency, make your queries faster
3) Avoid disaster in production
© DataStax, All Rights Reserved. 4
Data modeling objectives
1) Get your data out of Cassandra
2) Reduce query latency, make your queries faster
3) Avoid disaster in production
© DataStax, All Rights Reserved. 5
Data modeling objectives
1) Get your data out of Cassandra
2) Reduce query latency, make your queries faster
3) Avoid disaster in production
© DataStax, All Rights Reserved. 6
Data modeling methodology
Design by query
•  first, know your functional queries (find users by xxx, …)
•  then design the table(s) for direct access
•  denormalize if necessary
Output of design phase = schema.cql
Then start coding
© DataStax, All Rights Reserved. 7
The partition key
Role
Partition key
•  main entry point for query (INSERT/SELECT …)
•  help distribute/locate data on the cluster
No partition key = full cluster scan 😱
© DataStax, All Rights Reserved. 9
Data distribution
© DataStax, All Rights Reserved.
A: −x,−
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
B: −
3x
4
,−
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
C: −
2x
4
,−
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
D: −
x
4
,0
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
E: 0,
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
F:
x
4
,
2x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
G:
2x
4
,
3x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
H :
3x
4
,x
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
H
A
E
D
B C
G F
Partition key à hash value
Hash ranges:
10
C: −
2x
4
,−
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
Query by partition key
© DataStax, All Rights Reserved.
SELECT * FROM users WHERE user_id = user_id3
H
A
E
D
B C
G F
user_id3
request
Hash(user_id3)
C: −
2x
4
,−
x
4
⎤
⎦
⎥
⎥
⎤
⎦
⎥
⎥
11
How to choose correct partition key ?
Good partition column
•  functional identifier
•  high cardinality (lots of distinct values)
© DataStax, All Rights Reserved. 12
Example of good partition key
© DataStax, All Rights Reserved.
CREATE TABLE users(
user_id int,
…,
PRIMARY KEY(user_id))
H
A
E
D
B C
G F
user_id1
user_id2
user_id3
user_id4
user_id5
13
Example of bad partition key
© DataStax, All Rights Reserved.
CREATE TABLE product(
category text,
…,
PRIMARY KEY(category))
H
A
E
D
B C
G F
category1
14
category2
category3
Composite partition key
Multiple columns for partition key
•  always known in advance (INSERT/SELECT …)
•  hashed together to the same token value
© DataStax, All Rights Reserved.
CREATE TABLE product(
product_id uuid,
shop_id uuid,
product_name text,
…,
PRIMARY KEY((product_id,shop_id))
SELECT * FROM product WHERE product_id = … AND shop_id = …
15
SELECT * FROM product WHERE product_id = … AND shop_id IN (xxx, yyy …)
The clustering column(s)
Role
Clustering column(s)
•  simulate 1 – N relationship
•  sort data (logically & on disk)
© DataStax, All Rights Reserved. 17
Example
© DataStax, All Rights Reserved. 18
CREATE TABLE sensor_data (
sensor_id uuid,
date timestamp,
value double,
type text,
unit text,
PRIMARY KEY(sensor_id, date))
Partition key Clustering column
Recommended syntax
PRIMARY KEY((sensor_id), date))
Columns relationship
© DataStax, All Rights Reserved. 19
CREATE TABLE sensor_data (
sensor_id uuid,
date timestamp,
value double,
type text,
unit text,
PRIMARY KEY((sensor_id), date))
sensor_id (1) <------------> (N) date
Columns relationship
© DataStax, All Rights Reserved. 20
CREATE TABLE sensor_data (
sensor_id uuid,
date timestamp,
value double,
type text,
unit text,
PRIMARY KEY((sensor_id), date))
date (1) <---------> (1) (value, type, unit)
Clustering column sort
© DataStax, All Rights Reserved. 21
SELECT * FROM sensor_data
WHERE sensor_id = … AND date = '2016-06-14 10:00:00.000'
Direct key lookup
Range queries
SELECT * FROM sensor_data
WHERE sensor_id = … AND date >= '2016-06-14 10:00:00.000'
SELECT * FROM sensor_data
WHERE sensor_id = … AND date <= '2016-06-14 10:00:00.000'
SELECT * FROM sensor_data
WHERE sensor_id = …
AND date >= '2016-06-14 10:00:00.000’
AND date <= '2016-06-14 12:00:00.000’
On-disk clustering column sort
© DataStax, All Rights Reserved. 22
sensor_id1
2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ...
23.4 Temperature Celsius 23.7 Temperature Celsius … … …
sensor_id7
2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ...
21.3 Temperature Celsius 20.1 Temperature Celsius … … …
Node A
sensor_id5
2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ...
23.4 Temperature Celsius 23.7 Temperature Celsius … … …
sensor_id2
2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ...
21.3 Temperature Celsius 20.1 Temperature Celsius … … …
Node C
Reverse clustering sort
© DataStax, All Rights Reserved. 23
CREATE TABLE sensor_data (
sensor_id uuid,
date timestamp,
value double,
type text,
unit text,
PRIMARY KEY((sensor_id), date))
WITH CLUSTERING ORDER BY (date DESC)
sensor_id1
2016-06-14 11:00:00.0000 2016-06-14 10:00:00.0000 ...
23.7 Temperature Celsius 23.4 Temperature Celsius … … …
Multiple clustering columns
© DataStax, All Rights Reserved. 24
CREATE TABLE sensor_data (
sensor_id uuid,
priority int,
date timestamp,
value double,
type text,
unit text,
PRIMARY KEY((sensor_id), priority, date))
WITH CLUSTERING ORDER BY (priority ASC, date DESC)
sensor_id1
1 2
2016-06-14 11:00:00.0000 2016-06-14 10:00:00.0000 ...
23.7 Temperature Celsius 23.4 Temperature Celsius … … …
priority (1) <---------> (N) date
Multiple clustering columns
© DataStax, All Rights Reserved. 25
WHERE sensor_id = … AND priority = … AND date = …
WHERE sensor_id = … AND priority = … AND date >= … AND date <= …
WHERE sensor_id = … AND priority >= … AND priority <=
Nested map abstraction:
Map<sensor_id,
SortedMap<priority,
SortedMap<date, (value,type,unit)>>>
PRIMARY KEY((sensor_id), priority, date))
Primary key summary
© DataStax, All Rights Reserved. 26
PRIMARY KEY((sensor_id), priority, date))
Unicity of (sensor_id, priority, date)
Primary key summary
© DataStax, All Rights Reserved. 27
PRIMARY KEY((sensor_id), priority, date))
Used to locate node in the cluster
Used to locate partition in the node
Primary key summary
© DataStax, All Rights Reserved. 28
PRIMARY KEY((sensor_id), priority, date))
Used to lookup rows in a partition
Used for data sorting and range queries
Other critical details
Huge partitions
© DataStax, All Rights Reserved. 30
PRIMARY KEY((sensor_id), date))
Data for the same sensor stay in the same partition on disk
Huge partitions
© DataStax, All Rights Reserved. 31
PRIMARY KEY((sensor_id), date))
Data for the same sensor stay in the same partition on disk
If insert rate = 100/sec, how big is my partition after 1 year ?
à 100 x 3600 x 24 x 365= 3 153 600 000 cells on disks
Huge partitions
© DataStax, All Rights Reserved. 32
PRIMARY KEY((sensor_id), date))
Theorical limit of # cells for a partition = 2 x 109
Practical limit for a partition on disk
•  100Mb
•  100 000 – 1000 000 cells
Reasons ? Make maintenance operations easier
•  compaction
•  repair
•  bootstrap …
Sub-partitioning techniques
© DataStax, All Rights Reserved. 33
PRIMARY KEY((sensor_id, day), date))
à 100 x 3600 x 24 = 8 640 000 cells on disks ✔︎
Sub-partitioning techniques
© DataStax, All Rights Reserved. 34
PRIMARY KEY((sensor_id, day), date))
à 100 x 3600 x 24 = 8 640 000 cells on disks ✔︎
But impact on queries:
•  need to provide sensor_id & day for any query
•  how to fetch data across N days ?
Data deletion and tombstones
© DataStax, All Rights Reserved. 35
DELETE FROM sensor_data
WHERE sensor_id = .. AND date = …
Logical deletion of data but:
•  new physical "tombstone" column on disk
•  disk space usage will increase !
The "tombstone" columns will be purged later by
compaction process …
Workshop
Workshop setup
Full setup and materials are here
https://github.com/
VincentPoncet/DataStaxDay
© DataStax, All Rights Reserved. 37

More Related Content

What's hot

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016DataStax
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016Duyhai Doan
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016Duyhai Doan
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataVictor Coustenoble
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryIlya Ganelin
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaDuyhai Doan
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015StampedeCon
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraJim Hatcher
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Jon Haddad
 

What's hot (20)

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
Cassandra introduction 2016
Cassandra introduction 2016Cassandra introduction 2016
Cassandra introduction 2016
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016Sasi, cassandra on the full text search ride At  Voxxed Day Belgrade 2016
Sasi, cassandra on the full text search ride At Voxxed Day Belgrade 2016
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series LibraryFrustration-Reduced Spark: DataFrames and the Spark Time-Series Library
Frustration-Reduced Spark: DataFrames and the Spark Time-Series Library
 
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelonaCassandra nice use cases and worst anti patterns no sql-matters barcelona
Cassandra nice use cases and worst anti patterns no sql-matters barcelona
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 

Viewers also liked

Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016Duyhai Doan
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016Duyhai Doan
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronDuyhai Doan
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connectorDuyhai Doan
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data ModelingMatthew Dennis
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceDataStax Academy
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Markus Klems
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Duyhai Doan
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGDuyhai Doan
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and librariesDuyhai Doan
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data ModelingDuyhai Doan
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentationDuyhai Doan
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChatDuyhai Doan
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016Duyhai Doan
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...DataStax
 

Viewers also liked (20)

Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 
Cassandra 3 new features @ Geecon Krakow 2016
Cassandra 3 new features  @ Geecon Krakow 2016Cassandra 3 new features  @ Geecon Krakow 2016
Cassandra 3 new features @ Geecon Krakow 2016
 
Spark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotronSpark zeppelin-cassandra at synchrotron
Spark zeppelin-cassandra at synchrotron
 
Cassandra spark connector
Cassandra spark connectorCassandra spark connector
Cassandra spark connector
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Apache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis PriceApache Cassandra Data Modeling with Travis Price
Apache Cassandra Data Modeling with Travis Price
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016Apache Zeppelin @DevoxxFR 2016
Apache Zeppelin @DevoxxFR 2016
 
Cassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUGCassandra introduction @ NantesJUG
Cassandra introduction @ NantesJUG
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
Cassandra drivers and libraries
Cassandra drivers and librariesCassandra drivers and libraries
Cassandra drivers and libraries
 
KillrChat Data Modeling
KillrChat Data ModelingKillrChat Data Modeling
KillrChat Data Modeling
 
KillrChat presentation
KillrChat presentationKillrChat presentation
KillrChat presentation
 
Introduction to KillrChat
Introduction to KillrChatIntroduction to KillrChat
Introduction to KillrChat
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 

Similar to Datastax day 2016 : Cassandra data modeling basics

Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101DataStax Academy
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101DataStax Academy
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101DataStax Academy
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
Oracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_shareOracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_shareThomas Teske
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingScyllaDB
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftAmazon Web Services
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxDataStax
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphWebinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphDataStax
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAmazon Web Services
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingSpark Summit
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016DataStax
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra MetricsChris Lohfink
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin airKonstantine Krutiy
 

Similar to Datastax day 2016 : Cassandra data modeling basics (20)

Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101
 
Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101Cassandra Day London 2015: Data Modeling 101
Cassandra Day London 2015: Data Modeling 101
 
Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101Cassandra Day Atlanta 2015: Data Modeling 101
Cassandra Day Atlanta 2015: Data Modeling 101
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
Oracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_shareOracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_share
 
Tech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data ModelingTech Talk: Best Practices for Data Modeling
Tech Talk: Best Practices for Data Modeling
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
SRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon RedshiftSRV405 Ancestry's Journey to Amazon Redshift
SRV405 Ancestry's Journey to Amazon Redshift
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
How to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - DatastaxHow to get Real-Time Value from your IoT Data - Datastax
How to get Real-Time Value from your IoT Data - Datastax
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Webinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE GraphWebinar - Bringing connected graph data to Cassandra with DSE Graph
Webinar - Bringing connected graph data to Cassandra with DSE Graph
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
AWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing PerformanceAWS July Webinar Series: Amazon Redshift Optimizing Performance
AWS July Webinar Series: Amazon Redshift Optimizing Performance
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
 
Storing Cassandra Metrics
Storing Cassandra MetricsStoring Cassandra Metrics
Storing Cassandra Metrics
 
Extra performance out of thin air
Extra performance out of thin airExtra performance out of thin air
Extra performance out of thin air
 

More from Duyhai Doan

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Duyhai Doan
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandraDuyhai Doan
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplDuyhai Doan
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Duyhai Doan
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemDuyhai Doan
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsDuyhai Doan
 
Data stax academy
Data stax academyData stax academy
Data stax academyDuyhai Doan
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemDuyhai Doan
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDuyhai Doan
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceDuyhai Doan
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Duyhai Doan
 

More from Duyhai Doan (13)

Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
Pourquoi Terraform n'est pas le bon outil pour les déploiements automatisés d...
 
Le futur d'apache cassandra
Le futur d'apache cassandraLe futur d'apache cassandra
Le futur d'apache cassandra
 
Big data 101 for beginners devoxxpl
Big data 101 for beginners devoxxplBig data 101 for beginners devoxxpl
Big data 101 for beginners devoxxpl
 
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016Algorithme distribués pour big data saison 2 @DevoxxFR 2016
Algorithme distribués pour big data saison 2 @DevoxxFR 2016
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Apache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystemApache zeppelin the missing component for the big data ecosystem
Apache zeppelin the missing component for the big data ecosystem
 
Cassandra UDF and Materialized Views
Cassandra UDF and Materialized ViewsCassandra UDF and Materialized Views
Cassandra UDF and Materialized Views
 
Data stax academy
Data stax academyData stax academy
Data stax academy
 
Apache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystemApache zeppelin, the missing component for the big data ecosystem
Apache zeppelin, the missing component for the big data ecosystem
 
Distributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeConDistributed algorithms for big data @ GeeCon
Distributed algorithms for big data @ GeeCon
 
Spark cassandra integration, theory and practice
Spark cassandra integration, theory and practiceSpark cassandra integration, theory and practice
Spark cassandra integration, theory and practice
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015Algorithmes distribues pour le big data @ DevoxxFR 2015
Algorithmes distribues pour le big data @ DevoxxFR 2015
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Datastax day 2016 : Cassandra data modeling basics

  • 1. Data Modeling with Apache Cassandra™ DuyHai DOAN Apache Cassandra™ Evangelist, Apache Zeppelin™ committer
  • 2. 1 Data modeling objectives 2 The partition key 3 The clustering column(s) 4 Other critical details 5 Workshop © DataStax, All Rights Reserved. 2
  • 4. Data modeling objectives 1) Get your data out of Cassandra 2) Reduce query latency, make your queries faster 3) Avoid disaster in production © DataStax, All Rights Reserved. 4
  • 5. Data modeling objectives 1) Get your data out of Cassandra 2) Reduce query latency, make your queries faster 3) Avoid disaster in production © DataStax, All Rights Reserved. 5
  • 6. Data modeling objectives 1) Get your data out of Cassandra 2) Reduce query latency, make your queries faster 3) Avoid disaster in production © DataStax, All Rights Reserved. 6
  • 7. Data modeling methodology Design by query •  first, know your functional queries (find users by xxx, …) •  then design the table(s) for direct access •  denormalize if necessary Output of design phase = schema.cql Then start coding © DataStax, All Rights Reserved. 7
  • 9. Role Partition key •  main entry point for query (INSERT/SELECT …) •  help distribute/locate data on the cluster No partition key = full cluster scan 😱 © DataStax, All Rights Reserved. 9
  • 10. Data distribution © DataStax, All Rights Reserved. A: −x,− 3x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ B: − 3x 4 ,− 2x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ C: − 2x 4 ,− x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ D: − x 4 ,0 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ E: 0, x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ F: x 4 , 2x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ G: 2x 4 , 3x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ H : 3x 4 ,x ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ H A E D B C G F Partition key à hash value Hash ranges: 10 C: − 2x 4 ,− x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥
  • 11. Query by partition key © DataStax, All Rights Reserved. SELECT * FROM users WHERE user_id = user_id3 H A E D B C G F user_id3 request Hash(user_id3) C: − 2x 4 ,− x 4 ⎤ ⎦ ⎥ ⎥ ⎤ ⎦ ⎥ ⎥ 11
  • 12. How to choose correct partition key ? Good partition column •  functional identifier •  high cardinality (lots of distinct values) © DataStax, All Rights Reserved. 12
  • 13. Example of good partition key © DataStax, All Rights Reserved. CREATE TABLE users( user_id int, …, PRIMARY KEY(user_id)) H A E D B C G F user_id1 user_id2 user_id3 user_id4 user_id5 13
  • 14. Example of bad partition key © DataStax, All Rights Reserved. CREATE TABLE product( category text, …, PRIMARY KEY(category)) H A E D B C G F category1 14 category2 category3
  • 15. Composite partition key Multiple columns for partition key •  always known in advance (INSERT/SELECT …) •  hashed together to the same token value © DataStax, All Rights Reserved. CREATE TABLE product( product_id uuid, shop_id uuid, product_name text, …, PRIMARY KEY((product_id,shop_id)) SELECT * FROM product WHERE product_id = … AND shop_id = … 15 SELECT * FROM product WHERE product_id = … AND shop_id IN (xxx, yyy …)
  • 17. Role Clustering column(s) •  simulate 1 – N relationship •  sort data (logically & on disk) © DataStax, All Rights Reserved. 17
  • 18. Example © DataStax, All Rights Reserved. 18 CREATE TABLE sensor_data ( sensor_id uuid, date timestamp, value double, type text, unit text, PRIMARY KEY(sensor_id, date)) Partition key Clustering column Recommended syntax PRIMARY KEY((sensor_id), date))
  • 19. Columns relationship © DataStax, All Rights Reserved. 19 CREATE TABLE sensor_data ( sensor_id uuid, date timestamp, value double, type text, unit text, PRIMARY KEY((sensor_id), date)) sensor_id (1) <------------> (N) date
  • 20. Columns relationship © DataStax, All Rights Reserved. 20 CREATE TABLE sensor_data ( sensor_id uuid, date timestamp, value double, type text, unit text, PRIMARY KEY((sensor_id), date)) date (1) <---------> (1) (value, type, unit)
  • 21. Clustering column sort © DataStax, All Rights Reserved. 21 SELECT * FROM sensor_data WHERE sensor_id = … AND date = '2016-06-14 10:00:00.000' Direct key lookup Range queries SELECT * FROM sensor_data WHERE sensor_id = … AND date >= '2016-06-14 10:00:00.000' SELECT * FROM sensor_data WHERE sensor_id = … AND date <= '2016-06-14 10:00:00.000' SELECT * FROM sensor_data WHERE sensor_id = … AND date >= '2016-06-14 10:00:00.000’ AND date <= '2016-06-14 12:00:00.000’
  • 22. On-disk clustering column sort © DataStax, All Rights Reserved. 22 sensor_id1 2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ... 23.4 Temperature Celsius 23.7 Temperature Celsius … … … sensor_id7 2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ... 21.3 Temperature Celsius 20.1 Temperature Celsius … … … Node A sensor_id5 2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ... 23.4 Temperature Celsius 23.7 Temperature Celsius … … … sensor_id2 2016-06-14 10:00:00.0000 2016-06-14 11:00:00.0000 ... 21.3 Temperature Celsius 20.1 Temperature Celsius … … … Node C
  • 23. Reverse clustering sort © DataStax, All Rights Reserved. 23 CREATE TABLE sensor_data ( sensor_id uuid, date timestamp, value double, type text, unit text, PRIMARY KEY((sensor_id), date)) WITH CLUSTERING ORDER BY (date DESC) sensor_id1 2016-06-14 11:00:00.0000 2016-06-14 10:00:00.0000 ... 23.7 Temperature Celsius 23.4 Temperature Celsius … … …
  • 24. Multiple clustering columns © DataStax, All Rights Reserved. 24 CREATE TABLE sensor_data ( sensor_id uuid, priority int, date timestamp, value double, type text, unit text, PRIMARY KEY((sensor_id), priority, date)) WITH CLUSTERING ORDER BY (priority ASC, date DESC) sensor_id1 1 2 2016-06-14 11:00:00.0000 2016-06-14 10:00:00.0000 ... 23.7 Temperature Celsius 23.4 Temperature Celsius … … … priority (1) <---------> (N) date
  • 25. Multiple clustering columns © DataStax, All Rights Reserved. 25 WHERE sensor_id = … AND priority = … AND date = … WHERE sensor_id = … AND priority = … AND date >= … AND date <= … WHERE sensor_id = … AND priority >= … AND priority <= Nested map abstraction: Map<sensor_id, SortedMap<priority, SortedMap<date, (value,type,unit)>>> PRIMARY KEY((sensor_id), priority, date))
  • 26. Primary key summary © DataStax, All Rights Reserved. 26 PRIMARY KEY((sensor_id), priority, date)) Unicity of (sensor_id, priority, date)
  • 27. Primary key summary © DataStax, All Rights Reserved. 27 PRIMARY KEY((sensor_id), priority, date)) Used to locate node in the cluster Used to locate partition in the node
  • 28. Primary key summary © DataStax, All Rights Reserved. 28 PRIMARY KEY((sensor_id), priority, date)) Used to lookup rows in a partition Used for data sorting and range queries
  • 30. Huge partitions © DataStax, All Rights Reserved. 30 PRIMARY KEY((sensor_id), date)) Data for the same sensor stay in the same partition on disk
  • 31. Huge partitions © DataStax, All Rights Reserved. 31 PRIMARY KEY((sensor_id), date)) Data for the same sensor stay in the same partition on disk If insert rate = 100/sec, how big is my partition after 1 year ? à 100 x 3600 x 24 x 365= 3 153 600 000 cells on disks
  • 32. Huge partitions © DataStax, All Rights Reserved. 32 PRIMARY KEY((sensor_id), date)) Theorical limit of # cells for a partition = 2 x 109 Practical limit for a partition on disk •  100Mb •  100 000 – 1000 000 cells Reasons ? Make maintenance operations easier •  compaction •  repair •  bootstrap …
  • 33. Sub-partitioning techniques © DataStax, All Rights Reserved. 33 PRIMARY KEY((sensor_id, day), date)) à 100 x 3600 x 24 = 8 640 000 cells on disks ✔︎
  • 34. Sub-partitioning techniques © DataStax, All Rights Reserved. 34 PRIMARY KEY((sensor_id, day), date)) à 100 x 3600 x 24 = 8 640 000 cells on disks ✔︎ But impact on queries: •  need to provide sensor_id & day for any query •  how to fetch data across N days ?
  • 35. Data deletion and tombstones © DataStax, All Rights Reserved. 35 DELETE FROM sensor_data WHERE sensor_id = .. AND date = … Logical deletion of data but: •  new physical "tombstone" column on disk •  disk space usage will increase ! The "tombstone" columns will be purged later by compaction process …
  • 37. Workshop setup Full setup and materials are here https://github.com/ VincentPoncet/DataStaxDay © DataStax, All Rights Reserved. 37