Architecture et modèle de données Cassandra

2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
2013 © Trivadis
Architecture et modèle de données Cassandra
Genève 26.01.2015
Ulises Fasoli
Senior Consultant
Trivadis AG
January 2016
1

2013 © Trivadis
Agenda
1. Introduction to NoSQL datastores and Polyglot Persistence
2. What is Apache Cassandra?
3. Why Cassandra, What is DataStax?
4. Cassandra Architecture
5. Cassandra Data Model
6. Cassandra Query Language (CQL)
7. Cassandra/DataStax @ Trivadis
January 2016
2

2013 © Trivadis
History of Databases
1960s File-based, Network (CODASYL) and Hierarchical Databases
1970s Relational Database
1980 SQL became the standard query language
Early 1990 Object-Databases
Late 1990 XML Databases
2004 NoSQL Databases
January 2016
3

2013 © Trivadis
What‘s wrong with Relational Databases ?
• SQL provides a rich, declarative query language
• Database enforce referential integrity
• ACID semantics
• Well understood by developers, database administrators
• Well supported by different languages, frameworks and tools
• Hibernate, JPA, JDBC, iBATIS, Entity Framework
• Well understood and accepted by operations people (DBAs)
• Configuration
• Monitoring
• Backup and Recovery
• Tuning
• Design
January 2016
4
They are great ….

2013 © Trivadis
Relational Databases are great ... But!
New trends
Big Data
Concurrency
Connectivity
Diversity
P2P Knowledge
Cloud/Grid
January 2016
5

2013 © Trivadis
Problem: Complex Object Graphs
Object/Relational impedance mismatch
Complicated to map rich domain model
to relational schema
Performance issues
• Many rows in many tables
• Many joins
• Eager vs. lazy loading
ORDER
ADDRESS
CUSTOMER
ORDER_LINES
Order
ID: 1001
Order Date: 15.9.2012
Line Items
Customer
First Name: Peter
Last Name: Sample
Billing Address
Street: Somestreet 10
City: Somewhere
Postal Code: 55901
Name
Ipod Touch
Monster Beat
Apple Mouse
Quantity
1
2
1
Price
220.95
190.00
69.90
January 2016
6

2013 © Trivadis
Problem: Schema evolution
Adding attributes to an object => have to add columns to table
Expensive, if lots of data in that table
 Holding locks on the tables for long time
 What if new values should be mandatory, cannot enforce NOT NULL
constraint
 Application downtime …
January 2016
7

2013 © Trivadis
Problem: Semi-structured data
Relational schema doesn‘t easily handle semi-structured data
Common solutions
 Name/Value table
- Poor performance
- Lack of constraint
 Serialize as Blob
- Fewer joins, but no query capabilities
January 2016
8

2013 © Trivadis
RDBMS
Database
Problem: Scaling
Scaling writes difficult/expensive/impossible => Big Data
Scaling a relational database:
 Vertical scaling is limited and is expensive
 Horizontal scaling is limited and is expensive
RDBMS
Database
RDBMS
Database
RDBMS
Database
RDBMS
Database
RDBMS
Database
Node
1
Node
2
P1 P2 P3
ClientClientClient Client
Single DB => Partitioned Table => Database Sharding => Database Cluster
January 2016
9

2013 © Trivadis
So, what’s Wrong With RDBMS?
• Many programmers are already
familiar with it.
• Transactions and ACID make
development easy.
• Lots of tools to use.
• Rigid schema design.
• Harder to scale.
• Replication.
January 2016
10
Nothing
No one size fits all

2013 © Trivadis
Solution: NoSQL ?
No standard definition of what NoSQL means
• Not Only SQL and not No SQL
• Not only relational would have been better
Term began in a workshop organized in 2009
Use the right tools (DBs) for the job
It is more like a feature set, or event the not of a feature set
January 2016
11

2013 © Trivadis
Use Cases for NoSQL
• Massive write performance.
• Fast key value look ups.
• Flexible schema and data types.
• No single point of failure.
• Fast prototyping and development.
• Out of the box scalability.
• Easy maintenance.
January 2016
12

2013 © Trivadis
Brewer's CAP Theorem
Any networked shared-data system can have at most two of the three
desirable properties:
 Consistency
All of the nodes see the same data at
the same time, regardless of
where the data is stored
 Availability
Node failures do not prevent
survivors from continuing to
operate
 Network Partition tolerance
The system continues to operate
despite arbitrary message loss
January 2016
13
Availability
Consistency
Network
Partition
Tolerance
n/a
CA CP
AP

2013 © Trivadis
Data Store Positioning
January 2016
14
Scalability
Standardized Model, Tooling, Complexity
Key-value
Wide Column (Column Families / Extensible Records)
Document
Graph
Relational
SQL Comfort Zone
Multi Dimensional

2013 © Trivadis
Polyglot Persistence
In 2006, Neal Ford coined the term Polyglot
Programming
 Applications should be written in a mix of
languages to take advantage of the fact
that different languages are suitable for
tackling different problems
Polyglot Persistence defines a a hybrid
approach to persistence
 Using multiple data storage technologies
 Selected based on the way data is being
used by individual applications
 Why store binary images in RDBMs, when
there are better storage systems?
January 2016
15
Polyglot Programmer

2013 © Trivadis
Polyglot Persistence
Today we use the same
database for all kind of data
• Business transactions, session
management data, reporting,
logging information, content
information, ...
No need for same properties of
availability, consistency or
backup requirements
Polyglot Data Storage Usage
allows to mix and match
Relational and NoSQL data
stores
January 2016
16
Polygot Persistence Model
E-commerce Application
Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order
Key-Value RDMBS Document Graph
„Traditional“ Persistence Model
E-commerce Application
RDBMS
Shopping cart data User Sessions Product Catalog RecomendationsCompleted Order

2013 © Trivadis
Agenda
January 2016
17

2013 © Trivadis
Definition of Cassandra
Apache Cassandra™ is a free
• Distributed…
• High performance…
• Extremely scalable…
• Fault tolerant (i.e. no single point of failure)…
post-relational database solution.
Cassandra can serve as both real-time Datastore (the "system of record")
for online/transactional applications, and as a read-intensive database for
business intelligence systems.
January 2016
18

2013 © Trivadis
History of Cassandra
January 2016
19
Bigtable Dynamo

2013 © Trivadis
Architecture Overview
Cassandra was designed with the understanding that system/hardware
failures can and do occur :
• Peer-to-peer, distributed system
• All nodes the same
• Data partitioned among all nodes in the cluster
• Custom data replication to ensure fault tolerance
• Read/Write-anywhere design
January 2016
20

2013 © Trivadis
Big Data Scalability
• Capable of comfortably scaling to petabytes
• New nodes = Linear performance increases
• Add new nodes online
January 2016
21

2013 © Trivadis
Who is using Cassandra?
January 2016
22
Largest publicly known cluster has over 300 TB of data spanning 400
machines

2013 © Trivadis
Agenda
January 2016
23

2013 © Trivadis
Why Cassandra?
Tunable data consistency
Flexible schema design
Data Compression
CQL language (like SQL)
Support for key languages and
platforms
No need for special hardware or
software
Gigabyte to Petabyte scalability
Linear performance gains through
adding nodes
No single point of failure
Easy replication / data distribution
Multi-data center and Cloud
capable
No need for separate caching layer
January 2016
24

2013 © Trivadis
Cassandra Use Cases
Product Catalog / Playlists
Personalization
• Ads
• Recommendations
• Ratings
Fraud Detection
Time Series
• Finance
• Smart Meter
IoT / Sensor Data
Graph / Network data
January 2016
25

2013 © Trivadis
DataStax Enterprise Edition (DSE)
January 2016
26

2013 © Trivadis
Datastax OpsCenter
January 2016
27

2013 © Trivadis
Agenda
January 2016
28

2013 © Trivadis
Architecture Overview
Each node communicates with each other through the Gossip protocol,
which exchanges information across the cluster every second
A commit log is used on each node to capture write activity. Data durability
is assured
Data also written to an in-memory structure (memtable) and then to disk
once the memory structure is full (an SSTable)
January 2016
29

2013 © Trivadis
No Single Point of Failure
All nodes the same
Customized replication affords tunable
data redundancy
Read/write from any node
Can replicate data among different
physical data center racks
January 2016
30

2013 © Trivadis
Easy Replication / Data Distribution
Transparently handled by
Cassandra
Multi-data center capable
Exploits all the benefits of Cloud
computing
Able to do hybrid Cloud/On-
premise setup
January 2016
31

2013 © Trivadis
Partitioning
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition is used to assign it to
a node in the ring.
• Lightly loaded nodes moves position to alleviate highly loaded nodes.
January 2016
32

2013 © Trivadis
Data Replication
Replication for high availability and data durability
• Replication factor N: each row is replicated at N nodes
• Each row key k is assigned to a coordination node
• The coordinator node is responsible for replicating the rows within its
key range
January 2016
33

2013 © Trivadis
Partitioning and Replication
January 2016
34
01
1/2
F
E
D
C
B
A N=3
h(key2)
h(key1)

2013 © Trivadis
Data Replication
Each data item is replicated at N (replication factor) nodes.
Different Replication Policies
 Rack Unaware – replicate data at N-1 successive nodes after its
coordinator
 Rack Aware – uses 'Zookeeper' to choose a leader which tells nodes
the range they are replicas for
 Datacenter Aware – similar to Rack Aware but leader is chosen at
Datacenter level instead of Rack level.
January 2016
35

2013 © Trivadis
Write Path
When a write occurs, Cassandra stores the data in a structure in memory,
the Memtable, and also appends writes to the commit log on disk,
providing configurable durability.
January 2016
36

2013 © Trivadis
Write Requests
Coordinator sends a write request to all replicas that own the row being
written
January 2016
37

2013 © Trivadis
Write Consistency
The consistency level for writing to Cassandra specifies how many replicas
the write must succeed before returning an ACK to the client
• Quorum: (replication_factor / 2) + 1
January 2016
38

2013 © Trivadis
Read Path
When a read request for a row
comes in to a node, the row
must be combined from all
SSTables on that node that
contain columns from the row in
question
as well as from any unflushed
memtables, to produce the
requested data
January 2016
39

2013 © Trivadis
Read Requests
There are two types of read requests that a coordinator can send to a
replica:
• A direct read request
• A background read repair request
The number of replicas contacted by a direct read request is determined by
the consistency level specified by the client.
January 2016
40

2013 © Trivadis
Read Consistency
The consistency level for reading from Cassandra specified how many
replicas must respond before a result is returned to the client
• Quorum: (replication_factor / 2) + 1
January 2016
41

2013 © Trivadis
Agenda
January 2016
42

2013 © Trivadis
Cassandra Data Model
• Table is a multi dimensional map indexed by key (row key).
• Columns are grouped into Column Families
• Dynamic schema design allows for much more flexible data storage
than rigid RDBMS
• Each Column has
- Name
- Value
- Timestamp
January 2016
43

2013 © Trivadis
How Cassandra stores data
• Model brought from Google Bigtable
• Row Key and a lot of columns
• Column names sorted (UTF8, Int, Timestamp, etc.)
January 2016
44
Column Name … Column Name
Column Value Column Value
Timestamp Timestamp
TTL TTL
Row Key
1 2 Billion
BillionofRows

2013 © Trivadis
Cassandra Data Model
January 2016
Keyspace
45
Column Family Column Family

2013 © Trivadis
Row, row key, column key, and column value
January 2016
46
row
key
va
cola
vb
colb
vc
colc
vd
cold
Column keys (or column names)Row
Column values (or cells)
• Rows: individual rows constitute a column family
• Row key: uniquely identifies a row in a column family
• Row: stores pairs of column keys and column values
• Column key: uniquely identifies a column value in a row
• Column value : stores one value or a collection of values

2013 © Trivadis
Static vs. Dynamic Column Family
Static column family (skinny rows)
• Contains a predefined set of columns with metadata
• Number of columns can vary across multiple rows within the column family
• Similar to RDMBS, except no NULL values
January 2016
47
John
Lennon
1940
born
England
country
1980
died
Rock
style
artist
type
The Beatles
England
country
1957
founded
Rock
style
band
type

2013 © Trivadis
What is a wide row?
Rows may be described as “skinny” or “wide”
 Wide row – has a relatively large number of column keys (hundreds or
thousands); this number may increase as new data values are inserted
- For example, a row that stores all bands of the same style
- The number of such bands will increase as new bands are formed
 Note that column values do not exist in this example
- The column key – in this case a band name – stores all the data desired
- Could have stored the number of albums, or year founded, etc., as column
values
©2014 DataStax Training. Use only with permission.
Slide 48
Rock
The Animals The Beatles...
...
...
...
...
...

2013 © Trivadis
What are composite row key and
composite column key?
Composite row key – multiple components separated by colon
‘Revolver’ and 1966 are the album title and year
‘tracks’ value is a collection (map)
Composite column key – multiple components separated by colon
Composite column keys are sorted by each component
©2014 DataStax Training. Use only with permission.
Slide 49
Revolver:1966
Rock
genre
The Beatles
performer
{1: 'Taxman', ..., 14: 'Tomorrow Never Knows'}
tracks
Revolver:1966
Taxman
1:title
Eleanor Rigby
2:title
Tomorrow Never Knows
14:title...
...

2013 © Trivadis
Data Modelling with Cassandra
• De-normalize, De-normalize, De-normalize
• Forget about old-school 3NF
• De-normalize wherever you can for quicker retrieval and let application logic
handle the responsibility of reliably updating redundancies
• Rows are gigantic and sorted
• Giga-sized rows (2 billion columns max) can be used to store sortable and
sliceable columns
• Comments by timestamp, ordered bids by quoted price, Ratings by product, ..
• One row, one machine
• Each row stays on one machine
• Rows are not shared across nodes
• Beware of this, don't create hotspots with a high demand row!
January 2016
50
From Query to Model

2013 © Trivadis
Remember this
• Cassandra finds rows fast
• Cassandra scans columns fast
• Cassandra does not scan rows
January 2016
51

2013 © Trivadis
Cassandra API – Thrift vs. CQL
Thrift
• exposes the internal storage structure of Cassandra pretty much directly
• Complicated, low-level, full control
• legacy
CQL
• New way to go
• Provides thin abstraction layer over Cassandra's internal structure
• Hides some distracting and useless implementation details
• Allows to provide native syntax for common encodings/idioms (like
collections) instead of letting each client (library) re-implement them in their
own, different and thus incompatible way
January 2016
53

2013 © Trivadis
CQL Language
Very similar to RDBMS SQL syntax
Create objects via DDL (e.g. CREATE…)
Core DML commands supported: INSERT, UPDATE, DELETE
Query data with SELECT
Current version is CQL3
January 2016
54

2013 © Trivadis
CQL Shell for Apache Cassandra
cqlsh is the command line utility for execution CQL commands (think of
SQL*Plus for Cassandra)
CQL3 is default since Cassandra 1.2
January 2016
55
$ cqlsh
Connected to DataStaxCluster at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.5.24 | CQL spec 3.1.1 | Thrift
protocol 19.39.0]
Use HELP for help.
cqlsh>

2013 © Trivadis
The CQL/Cassandra Mapping – Static Table
January 2016
name | age | role
-----+-----+-----
john | 37 | dev
eric | 38 | ceo
age role
john 37 dev
Eric 38 ceo
CREATE TABLE employee (
name text PRIMARY KEY,
age int,
role text);
56

2013 © Trivadis
Create a Dynamic table (wide-row) Employee
A Dynamic Table is also created with the CREATE TABLE statement but
using a composite primary key
January 2016
57
cqlsh:training> CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)
);

2013 © Trivadis
Insert data into Employee
The INSERT command is similar to the SQL counterpart
Major difference is that the PRIMARY KEY is always required
If the same statement is executed twice, there will be no error
if same PRIMARY KEY value is reused with different other column value,
then the last one wins!
January 2016
59
cqlsh:training> INSERT INTO employee (name, age, role)
VALUES ('john', 37, 'dev');
VALUES ('eric', 38, 'ceo');

2013 © Trivadis
Retrieving data from Employee table (II)
Restriction on column other than PRIMARY KEY won't work
Can be solved with an Index (but be careful, better use de-normalization)
January 2016
60
cqlsh:training> SELECT * FROM employee
WHERE age = 37;
Bad Request: No indexed columns present in by-columns clause
with Equal operator
cqlsh:training> CREATE INDEX employee_age_idx
ON employee (age);
cqlsh:training> SELECT * FROM employee
WHERE age = 37;
name | age | role
------+-----+------
john | 37 | dev
(1 rows)

2013 © Trivadis
Update data in Employee
The UPDATE statement is similar to the SQL UPDATE command
Just as with the INSERT, the PRIMARY KEY column must be specified as
part of the UPDATE
In CQL the UPDATE does not check for the existence of the row, if it does
not exist, CQL will just create it
January 2016
61
cqlsh:training> UPDATE employee SET age = 38
WHERE name = 'john';

2013 © Trivadis
Cassandra Data Types
January 2016
62
Category CQL Data Type Description
String ascii US-ASCII character string
text UTF-8 encoded string, used most of the time for
storing String data.
varchar UTF-8 Strings.
inet Used for storing IP addresses
Numeric int 32-bit signed integer
float 32-bit IEEE-754 floating point
double 64-bit IEEE-754 floating point
varint Arbitrary precision integers
bigint 64-bit number, equivalent to long.
decimal Variable-precision decimal
counter Distributed counter value (64-bit long)

2013 © Trivadis
Cassandra Data Types (II)
January 2016
63
Category CQL Data Type Description
UUIDs uuid A UUID in standard UUID format
timeuuid Type 1 UUID only, for storing unique time-base
IDs
Collections list Ordered collection of one or more elements
map Collection of arbitrary key-value pairs
set Unordered collection of one or more unique
elements
Miscellaneous boolean Boolean (true/false)
blob Used for storing binary data written in
hexadecimal
timestamp Date/Time

2013 © Trivadis
Cassandra Data Types (III)
TimeUUID
• Have a few extra functions, that allow extracting the time information
• now() returns a new TimeUUID with the time of the current timestamp,
ensures globally unique values
• minTimeuuid() and maxTimeuuid() are used when querying ranges of
TimeUUIDs
Counter
• Cannot mix counter columns with other types
• Value can not be set, only incremented/decremented by specified amount
• Counters may not be part of the PRIMARY KEY of the table
January 2016
64
WHERE event_time > maxTimeuuid('2013-01-01 00:05+0000')
AND event_time < minTimeuuid('2013-02-02 10:00+0000')

2013 © Trivadis
Collections
CQL3 also supports collections for storing complex data structures
• Set {value,…}, List [value,…], Map {key:value,…}
January 2016
65
cqlsh:training> CREATE TABLE collection_sample(
id int PRIMARY KEY,
string_set set<text>,
string_list list<text>,
string_map map<text, text>);
cqlsh:training> INSERT INTO coll
(id, string_set, string_list, string_map)
VALUES (1,
{'text1','text2','text1'},
['text1','text2','text1'],
{'key1':'value1'});

2013 © Trivadis
Collections (II)
January 2016
66
cqlsh:training> SELECT * FROM collection_sample;
id | string_list | string_map | string_set
----+-----------------------------+--------------------+--------------------
1 | ['text1', 'text2', 'text1'] | {'key1': 'value1'} | {'text1', 'text2'}
(1 rows)

2013 © Trivadis
Counter Columns
Create a Counter Column Table that counts “favorite” events
January 2016
67
cqlsh:training> CREATE TABLE favorites (
product_id int,
month int,
number COUNTER,
PRIMARY KEY (product_id, month));
cqlsh:training> UPDATE favorites SET number = number + 1
WHERE product_id = 4910 AND month = 06;
cqlsh:training> SELECT * FROM favorites;
product_id | month | number
------------+-------+--------
4910 | 6 | 1

2013 © Trivadis
Time-to-Live (TTL) on Insert
Insert a row with a TTL in seconds (30s) – after that the row is deleted
January 2016
68
VALUES ('bob', 29, 'dev')
USING TTL 30;
cqlsh:training> SELECT TTL(role)
FROM employee WHERE name='bob';
ttl(role)
-----------
22
cqlsh:training> SELECT TTL(role) FROM employee WHERE
name='bob';
(0 rows)

2013 © Trivadis
Trivadis / DataStax Partnership
• Since December 2014 we are a DataStax silver partner
• DataStax Partner Network (DSPN)
• Available certifications
• Admin
• Developer
• Architect
• Currently only one other partner in Switzerland: Intersys
• http://www.datastax.com/partners
January 2016
70

2013 © Trivadis
Questions and answers ...
2013 © Trivadis
BASEL BERN BRUGES LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA
Ulises Fasoli
Senior consultant
+41 21 321 47 00
ulises.fasoli@trivadis.com
January 2016
71

Architecture et modèle de données Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Architecture et modèle de données Cassandra

Similar to Architecture et modèle de données Cassandra (20)

Recently uploaded

Recently uploaded (20)

Architecture et modèle de données Cassandra