Le monde NOSQL pour les spécialistes du relationnel,

BÂLE BERNE BRUGG DUSSELDORF FRANCFORT S.M. FRIBOURG E.BR. GENÈVE
HAMBOURG COPENHAGUE LAUSANNE MUNICH STUTTGART VIENNE ZURICH
#SDF16
Cassandra for DBAs
Ulises Fasoli – Senior Consultant Trivadis Lausanne AD

#SDF16
Programme
1. NoSQL Landscape
2. What is Apache Cassandra?
3. Cassandra Architecture
4. Data Distribution & Replication
5. Cassandra Data Model
6. Cassandra Write path
7. Cassandra Read path
8. Tools
9. Last thoughts and conclusion

#SDF16
NoSQL Landscape : Type of databases
Type Example Description
Key-Value
keys map to arbitrary values of any
data type
Wide Column
keys mapped to sets of n-number of
typed columns
Document
document sets (JSON) queryable in
whole or part
Graph
data elements each relate to n others
in a graph/network

#SDF16
Brewer's CAP Theorem
• Consistency
do you get identical results, regardless which node
is queried?
• Availability
can the cluster respond to very high write and read
volumes?
• Network Partition tolerance is the cluster still
available when part of it goes dark?
Availability
Consistency
Network
Partition
Tolerance
n/a
CA CP
AP
Any networked shared-data system can have at most two of the three desirable properties:

#SDF16
What is Apache Cassandra?

#SDF16
Cassandra
 Fully distributed, with no single point of failure
 Free and open source, with deep developer support
 Highly performant, with near-linear horizontal scaling in proper
use cases
Bigtable Dynamo

#SDF16
Use cases for Cassandra
 Product Catalog / Playlists
 Personalization
 Ads / Recommendations
 Fraud Detection
 Time Series
 IoT / Sensor Data
 Graph / Network data

#SDF16
Architecture overview
 Designed with the understanding that system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes are identical in the cluster
 Data partitioned among all nodes in the cluster
 Custom data replication to ensure fault tolerance
 Read/Write-anywhere design

#SDF16
What is a cluster?
 A peer to peer set of nodes
• Node – one Cassandra instance
• Rack – a logical set of nodes
• Data Center – a logical set of racks
• Cluster – the full set of nodes which map to a single complete
token ring
Node 4
Node 1
Node 2
Data Center - East
Node 1
Node 3
Node 4
Node 3
Data Center - West
Rack 1
Rack 2
Node 2
Rack 1
Rack 2
Cassandra
Cluster
- 263+ 263
Token Range
(Murmur3)

#SDF16
What is a cluster?
Node 1
Node 3
Node 2Node 4
127.0.0.1
127.0.0.2
127.0.0.3
127.0.0.4
Seed
 Nodes join a cluster based on the configuration of
their own conf/cassandra.yaml file
 Some key settings :
• cluster_name
• seeds
• listen_address

#SDF16
What is a coordinator?
 The node chosen by the client to receive a
particular read or write request to its cluster
 Any node can coordinate any request
 Each client request may be coordinated by a
different node
 No single point of failure
 Fundamental principle to Cassandra's architecture
Node 1
Node 3
Node 2Node 4
Client Driver

#SDF16
Data Distribution & Replication

#SDF16
Data Partitioning & distribution
 Nodes are logically structured in Ring Topology.
 Each node is responsible for a part of the overall database
 Data is assigned to a specific node based on a hashed value of key
 Lightly loaded nodes can move position to alleviate highly loaded nodes

#SDF16
Data Partitioning
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)

#SDF16
Data Replication
 Defined at keyspace level
o Replication factor : how many replicas to make
o Replication strategy : on which node should each replica be
placed
 All partitions are "replicas", there are no "originals"
 First replica : placed on the node owning its token's primary
range

#SDF16
Data Replication / Distribution
 Native data replication / distribution support
 Transparently handled by Cassandra
 Multi-data center capable
 Hybrid Cloud/On premise support

#SDF16
What is consistency ?
Node 1
Node 3
Node 2Node 4
Client Driver
Just one?
CL=ONE
Two?
CL=TWO
51%?
CL=QUORUM
 Partition key determines which nodes are
sent any given request
• Consistency Level : how many nodes must
acknowledge before response is sent
 The meaning varies by type
• Write request – how many nodes
must acknowledge the write?
• Read request – how many nodes
must acknowledge by sending their
most recent copy of the data?

#SDF16
What is immediate vs. eventual consistency?
 Immediate Consistency – reads always return the most recent data
• Immediate consistency guaranteed with Consistency Level ALL
• Highest latency (all replicas are checked and compared)
 Eventual Consistency – reads may return stale data
• Consistency Level ONE carries the highest risk of stale data
• Lowest latency (first replica is immediately returned)
ANY ALLONE TWO . . . .
0 Total Nodes (N)1 2
Available Replicas
Consistency Level
Read repairs are there to prevent entropy

#SDF16
Cassandra Data Model
 The Cassandra data model defines
1. Column family as a way to store and organize data
2. Table as a two-dimensional view of a multi-dimensional column family
3. Cassandra Query Language (CQL) : A language to perform operations
on tables

#SDF16
Cassandra Data Model
Keyspace
Column Family Column Family

#SDF16
What is a column family
row
key3
v3.a
cola
v3.b
colb
v3.c
colc
v1.a
cola
v1.b
colb
v1.c
colc
v2.a
cola
v2.b
colb
v2.c
colc
v3.d
cold
v1.d
cold
v2.d
cold
COLUMNS
ROWS
row
key1
row
key2
CELLS
 Column family – set of rows with a similar structure
• Sorted columns
• Multidimensional
• Distributed
• Sparse

#SDF16
What are row, row key, column key, and column value?
• Rows – individual rows constitute a column family
• Row key – uniquely identifies a row in a column family
• Row – stores pairs of column keys and column values
• Column key – uniquely identifies a column value in a row
• Column value – stores one value or a collection of values
row
key
va
cola
vb
colb
vc
colc
vd
cold
Column keys (or column names)Row
Column values (or cells)

#SDF16
What are row, row key, column key, and column value?
John
Lennon
1940
born
England
country
1980
died
Rock
style
artist
type
The Beatles
England
country
1957
founded
Rock
style
band
type
Row key Column keys
Column values

#SDF16
What is a wide row?
 Rows may be described as “skinny” or “wide”
• Skinny row –fixed, relatively small number of column keys
 Wide row –relatively large number of column keys (hundreds or thousands)
• For example, a row that stores all bands of the same style
• The number of such bands will increase as new bands are formed
Rock
The Animals The Beatles...
...
...
...
...
...

#SDF16
What are composite row key and composite column key?
 Composite row key – multiple components separated by colon
 Composite column key – multiple components separated by colon
• Composite column keys are sorted by each component
Revolver:1966
Rock
genre
The Beatles
performer
{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}
tracks
Revolver:1966
Taxman
1:title
Eleanor Rigby
2:title
Tomorrow Never Knows
14:title...
...

#SDF16
What are partition, partition key, row, column, and cell?
Column family
view
Table with single-row
partitions

#SDF16
What are composite partition key and clustering column?
Table with multi-row partitions
partitions
album_title year num
ber
track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never Knows
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
Magical Mystery Tour 1967 1 Magical Mystery Tour
Magical Mystery Tour 1967 … …
Magical Mystery Tour 1967 11 All You Need Is Love
rows in a partition/table
columns
composite partition key
clustering column
cells

#SDF16
What are composite partition key and clustering column?
Revolver:1966
Taxman
1:title
Two Of Us
1:title
Let It Be:1970
Magical Mystery
Tour:1967
Magical Mystery Tour
1:title
Doctor Robert
11:title
Get Back
11:title
All You Need Is Love
11:title
Tomorrow Never
Knows
14:title...
...
...
...
...
...
...
...
Table with multi-row partitions : Column family view

#SDF16
What are static columns?
Table with multi-row partitions and static columns
album_title year num
ber
genre performer track_title
Revolver 1966 1 Rock The Beatles Taxman
Revolver 1966 … Rock The Beatles …
Revolver 1966 14 Rock The Beatles Tomorrow Never Knows
Let It Be 1970 1 Rock The Beatles Two Of Us
Let It Be 1970 … Rock The Beatles …
Let It Be 1970 11 Rock The Beatles Get Back
Static
columns

#SDF16
What is a primary key?
 Primary key uniquely identifies a row in a table
 Simple or composite partition key and all clustering columns (if present)
performer born country died founded style type
John Lennon 1940 England 1980 Rock artist
Paul McCartney 1942 England Rock artist
album_title year number track_title
Revolver 1966 1 Taxman
Revolver 1966 … …
Revolver 1966 14 Tomorrow Never
Let It Be 1970 1 Two Of Us
Let It Be 1970 … …
Let It Be 1970 11 Get Back
composite partition key
+
clustering column
Primary
key
Single partition key

#SDF16
What is a table or CQL Table?
 A CQL table is a column family
• CQL tables provide two-dimensional views of a column family, which contains
potentially multi-dimensional data, due to composite keys and collections
 CQL table and column family are largely interchangeable terms
 Supported by declarative language Cassandra Query Language (CQL)

#SDF16
Cassandra Query Language (CQL)
 Data Definition Language, subset of CQL
 SQL-like syntax, but with somewhat different semantics

#SDF16
Cassandra Data Model differences from RDBMS
Cassandra RDBMS
Cassandra deals with unstructured data. RDBMS deals with structured data.
Cassandra has a flexible schema. It has a fixed schema.
In Cassandra, a table is a list of “nested key-value
pairs”. (ROW x COLUMN key x COLUMN value)
In RDBMS, a table is an array of arrays. (ROW x
COLUMN)
keyspace is the outermost container that contains data
corresponding to an application.
Database is the outermost container that contains
data corresponding to an application.
Tables or column families are the entity of a keyspace. Tables are the entities of a database.
Row is a unit of replication in Cassandra. Row is an individual record in RDBMS.
Column is a unit of storage in Cassandra. Column represents the attributes of a relation.
Relationships are represented using collections. RDBMS supports the concepts of foreign keys, joins.

#SDF16
Write path: how is data written
 Cassandra is a log-structured storage engine
 Data is sequentially appended, not placed in pre-set locations
RDBMS CASSANDRA
Continuously appends to a log
?
?
Seeks and writes values to
various pre-set locations

#SDF16
How does the write path flow on a node?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
Node memory
Node file system
Client partition key1 first:Oscar last:Orange level:42
partition key2 first:martin last:Blue
Memtable (corresponds to a CQL table)Coordinator
CommitLog
AppendOnly … … … …
… … … …
Flush current state to SSTable
Each write request … Periodically …
Periodically …
… … … …
… … … …
… … … …
… … … …
… … … …
Compaction
Compact related SSTables
SSTables
partition key3 first:Ricky last:Red

#SDF16
What is the Commit Log?
 An append-only log used to automatically rebuild Memtables
on restart of a downed node.
 Memtables flush to disk when CommitLog size reaches total
allowed space
 Entries are marked as flushed, as corresponding Memtable
entries flush to disk as an SSTable
 CommitLog options are configured in the Cassandra.yaml file
CommitLog

#SDF16
What are Memtables and how are they flushed to disk?
 Memtables are in-memory representations of a CQL table :
• Each node has a Memtable for each CQL table in the keyspace
• Each Memtable accrues writes and provides reads for data not yet flushed
• Updates to Memtables mutate the in-memory partition
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable

#SDF16
What is a SSTable and what are its characteristics?
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
 A SSTable ("sorted string table") is
• an immutable file of sorted partitions
• written to disk through fast, sequential i/o
• contains the state of a Memtable when flushed
 The current data state of a CQL table is comprised of
• its corresponding Memtable plus
• all current SSTables flushed from that Memtable
 SSTables are periodically compacted from many to one

#SDF16
How does the read path flow on each node?
MemTable (e.g., player)
Coordinator
SSTables (e.g., player)
… … … …
pk7 … … level:42
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
pk2 … … …
pk7 first:Elizabeth
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
Read
<pk7>
Hit
pk1, pk2pk1, pk2, pk7
Node memory
Node file system
Off Heap On HeapRow cache hit
pk1 … … …
pk2 … … …

#SDF16
… … … …
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
pk2 … … …
pk7 first:Elizabeth
timestamp 994
pk7 first:Elizabeth last:Blue level:42
pk1 … … …
pk2 … … …
Bloom Filter
Bloom Filter
Bloom Filter
Miss
pk1, pk2
Node memory
Node file system
Hit
Hit
Read
<pk7>
Off Heap On HeapKey cache hit
Coordinator
?
Key
Cache
pk7?
, pk7

#SDF16
… … … …
timestamp 1114
pk1 … … …
pk7 first:Betty
timestamp 541
last:Blue
timestamp 541
level:63
timestamp 541
Pk2 … … …
pk7 first:Elizabeth
timestamp 994
Pk7 first:Elizabeth last:Blue level:42
pk1 … … …
pk2 … … …
pk1, pk2
Bloom Filter
Bloom Filter
Bloom Filter
Miss
pk1, pk2, pk7
Node memory
Node file system
Miss Partition
Summary
Partition
Index
Miss Partition
Summary
Partition
Index
?
?
Key
Cache
pk7
Read
<pk7>
Off Heap On HeapRow and Key miss
Coordinator

#SDF16
Tools : CQLSH
 Interactive command line CQL utility
 Supports tab completion for commands
 Think of it as SQL*Plus for Cassandra

#SDF16
Tools : Cassandra Cluster Manager (CCM)
 Open source utility
 Creates and manages multi-node clusters on a local machine
 Not for production configuration
 Useful for :
• Testing failure scenarios
• Development / Prototyping without the hardware
• Version migrations
• …

#SDF16
Tools : Nodetool
 Command-line cluster management utility
 Supports over 40 commands like :
• Status
• Info
• ring

#SDF16
Tools : Datastax : DevCenter
 Visually Create and Navigate Database Objects
 View Query Results and Tune Queries for Faster Performance

#SDF16
Tools : Datastax OpsCenter
 Web-based visual management and monitoring solution
 Visual cluster management
 Point-and-Click Provisioning and Administration
 Secured administration
 Visual monitoring and tuning
 Customizable Dashboards

#SDF16
Tools : Datastax OpsCenter

#SDF16
Last thoughts and conclusion

#SDF16
DBAs wanted
 NoSQL and Cassandra will not replace RDBMS
 Different tools for different jobs
 Current situation :
• Community largely driven by developers and sysadmins
• Community needs insight from DBAs to make the database evolve
• Get involved!

#SDF16
https://academy.datastax.com/
https://academy.datastax.com/courses
https://academy.datastax.com/courses/ds201-cassandra-core-concepts
https://academy.datastax.com/courses/ds210-operations-and-performance-tuning
Sources and additionnal information

#SDF16
Ulises Fasoli
Senior Consultant AD Trivadis
Tél. +41 21 321 47 00
ulises.fasoli@trivadis.com

#SDF16
Feedback
Confirmez votre présence et évaluez la session avec ce QRC.
Un vol en montgolfière à gagner !

Le monde NOSQL pour les spécialistes du relationnel,

Recommended

Recommended

More Related Content

Similar to Le monde NOSQL pour les spécialistes du relationnel,

Similar to Le monde NOSQL pour les spécialistes du relationnel, (20)

More from Swiss Data Forum Swiss Data Forum

More from Swiss Data Forum Swiss Data Forum (19)

Recently uploaded

Recently uploaded (20)

Le monde NOSQL pour les spécialistes du relationnel,