Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

Best Better practice of Cassandra
Cassandraに不向きなCassandraデータモデリング基礎
Hayato Tsutsumi
Works Applications

Hayato Tsutsumi
堤勇人
Cassandra experience :
7 years (from ver.0.6)
Certification for Apache Cassandra Administarator
Data size : about 40TB
Nodes：about 40 (would increase soon...)
Twitter : 2t3
Site Reliability Engineering Div.
Works Applications Co., Ltd
自己紹介 Speaker

Target
● Mid-range System
Data size
1TB ~ 1PB
Data amount
10 Mil ～ 100 Bil
+ high speed processing

Best practice = Right people, right place
適材適所は確かにベスト
Suitable
Data

But not all data is suitable
じゃあベストじゃない部分は？
Our system
Suitable
Data
Un-suitable data for Cassandra

Use both Cassandra and RDB?
O*
or
M*
Suitable
Data

You may use only RDB...
O*
or
M*
Suitable
Data

Another way : Manage data only with Cassandra
Suitable
Data

3 models unlike NoSQL
Historical data
履歴管理データ
Tree structure
ツリー構造
Summarized data
計上データ

How Cassandra
read data in 3mins
前
提

Partition key &
Clustering key
CREATE TABLE test_table (
pkA text,
pkB text,
ckA text,
ckB text,
v text,
w text,
PRIMARY KEY ((pkA, pkB), ckA, ckB)
);
Partition Key Clustering Key
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
pkA1 pkB1 ckA1 ckB2 v2 w2
on Table
on Cassandra Cassandra can search Column name

Partition key &
Clustering key
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
);
hash(pkA1
,pkB1)
Column ckA1:ckB1:v ckA1:ckB1:w ckA1:ckB2:v ckA1:ckB2:w
Value v1 w1 v2 w2
hash(pkA1
,pkB2)
Column ckA3:ckB3:v ckA3:ckB3:w
Value v3 w3
Column pkA pkB ckA ckB v w
Value pkA1 pkB1 ckA1 ckB1 v1 w1
on Table
on Cassandra Cassandra can search Column name

Partition key &
Clustering key
pkA text,
pkB int,
ckA int,
ckB int,
v text,
w text,
);
where pkA = "pkA1"; //NG
where pkA = "pkA1" and pkB = "pkB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB = "ckB1"; //OK
where pkA = "pkA1" and ckA = "ckA1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB >= "pkB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA = "ckA1"
and ckB >= "ckB1"; //OK
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckB = "ckB1"; //NG
where pkA = "pkA1" and pkB = "pkB1" and ckA >= "ckA1"
and ckA < "ckA2"; //OK

Historical data
履歴管理データ
photo by Bryan Wright
(https://secure.flickr.com/photos/spidermandragon5/2922128673/)

社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21

社員の異動情報
Employee transfer history
A Div. B Div. C Div.
A Div. C Div. D Div.
emp001
emp002
D Div. E Div.emp003
4/1 4/16 5/13/112/1 2/21
at 3/25
emp001 emp002 emp003
B Div. A Div. E Div.
at 4/25
emp001 emp002 emp003
C Div. D Div. E Div.

emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, e, no)
);
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //NG
?

emp_history table
CREATE TABLE emp_history (
id text,
no text,
s date,
e date,
div text,
PRIMARY KEY (id, s, no)
);
CREATE CUSTOM INDEX fn_e ON
emp_history (e) USING
'org.apache.cassandra.index.sasi.
SASIIndex';
select * from emp_history
where id = 'test' and s <= '2017/03/25' and e > ''2017/03/25''; //OK
use custom index

Tree structure
ツリー構造

組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table

判断のポイント
Criteria
● No join, recursive query
● Anyway need consistency
● Jaywalk or
denormalization is Natural
● JOIN、再帰問い合わせ
不可
● 整合性はどの道別の方
法で取る必要がある
● ジェイウォーク、非正規化
も当たり前

ツリー構造への要求
Requirement to tree model
● show ancestors
● show children
● show descendants
● show sibilings of a
● あるノードからルートまで
の全ての親を取得
● 子供を1段展開
● 子供を全て展開
● 兄弟を取得

組織構造
Organization tree
A Div. a Dept.
b Dept.
1 Sec.
2 Sec.
3 Sec.
4 Sec.
5 Sec.
Well known models
● Adjacency list
● Path Enumeration
● Nested set
● Closure table
Worth considering!

経路列挙
Path enumeration
CREATE TABLE pathenum (
id text,
fqdn text,
child text,
code text,
PRIMARY KEY (id, fqdn)
);
id fqdn child code
test A [a,b] A
test A:a [1,2] a
test A:b [3,4,5] b
test A:a:1 1
test A:a:2 2
test A:b:3 3
test A:b:4 4
test A:b:5 5
A a
b
1
2
3
4
5

経路列挙
Path enumeration
id text,
fqdn text,
child text,
code text,
);
select * from pathenum
where id = 'test' and fqdn like 'A:'; //NG
It needs 'like' search
A a
b
1
2
3
4
5

経路列挙
Path enumeration
id text,
fqdn text,
child text,
code text,
);
where id = 'test' and fqdn like 'A:'; //NG
where id = 'test' and fqdn >= 'A:' and fqdn < 'A;'; //OK
: U+003A
; U+003B
It needs 'like' search
A a
b
1
2
3
4
5

経路列挙
Path enumeration
id text,
fqdn text,
child text,
code text,
);
//show ancestors
fqdn.split(":");
//show children of a
select child from pathenum where id = 'test' and
fqdn = 'A:a';
//show descendants of A
select * from fqdntest where id = 'test' and
fqdn >= 'A:' and fqdn < 'A;';
//show sibilings of a
select p from fqdntest where
id = 'test' and fqdn = 'A';
A a
b
1
2
3
4
5

経路列挙
Path enumeration
id text,
fqdn text,
child text,
code text,
);
pros
- one access
cons
- hot spot
- range slice
- complex process when update
pros & cons

閉包テーブル
Closure table
CREATE TABLE closure_main (
id text,
v text,
PRIMARY KEY (id)
);
CREATE TABLE closure_path (
p text,
c text,
d int,
PRIMARY KEY (p, d, c)
);
id v
A A Div.
a a Dept.
b b Dept.
1 1 Sec.
2 2 Sec.
3 3 Sec.
4 4 Sec.
5 5 Sec.
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0

閉包テーブル
Closure table
id text,
v text,
PRIMARY KEY (id)
);
p text,
c text,
d int,
);
CREATE CUSTOM INDEX fn_c ON
test.closure_path (c) USING 'org.apache.
cassandra.index.sasi.SASIIndex';
p c d
A A 0
A a 1
A b 1
A 1 2
A 2 2
A 3 2
A 4 2
A 5 2
a a 0
a 1 1
p c d
a 2 1
1 1 0
2 2 0
b b 0
b 3 1
b 4 1
b 5 1
3 3 0
4 4 0
5 5 0
//show ancestors
select p from closure_path where c = '1';
select * from closure_main where id in [?];
//show children of a
select c from closure_path where p = 'a' and d
= 1;
//show descendants of A
select c from closure_path where p = 'A';
//show sibilings of a
//load a's parent = A
select * from closure_path where c = 'a';
select c from closure_path where p = 'A' and d
= 1;

pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
pros & cons
閉包テーブル
Closure table
id text,
v text,
PRIMARY KEY (id)
);
p text,
c text,
d int,
);

pros
- Distributed
- get access
cons
- need an index
- 2 ~ 3 times access
- increase data
pros & cons
閉包テーブル
Closure table
id text,
v text,
PRIMARY KEY (id)
);
p text,
c text,
d int,
);
How increase data?
When assume n-children per
node and d-depth tree,
number of data will be
proportional to d.

Summarized
data
計上データ

伝票集計処理
Aggregation of slips
Dr. Cr.
A 200 B 50
C 150

伝票集計処理
Aggregation of slips
parallel batch processing
aggregation
online streaming

要求水準
Requirements
● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる

● miscalculation = critical
● need parallel / streaming
processing
● need high speed
processing
● 誤計算は死
● バッチの並列処理、オン
ラインによるストリーミン
グ処理が必要
● 高速処理が求められる
= Consistency!
要求水準
Requirements

計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v counter
);
UPDATE countup SET v = v + 1 WHERE id = 'test';
Use Counter...? No.

計上データ
Summarized data
CREATE TABLE countup (
id text PRIMARY KEY,
v int
);
UPDATE countup set v = 101 where id = 'test' if v =
100;
Use update with LWT

Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

More Related Content

Similar to Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point

More from Works Applications

Recently uploaded

Cassandraに不向きなcassandraデータモデリング基礎 / Data Modeling concepts for NoSQL weak point