Pythian: My First 100 days with a Cassandra Cluster

My First 100 days with a Cassandra
Cluster
Presented by :
Gustavo René Antúnez DBA Team Lead
Carlos Rolo Cassandra MVP
September, 2015

2
Welcome to Cassandra Summit 2015

• 18
Years
of
Data
infrastructure

management
consulting
• 200+
Top
brands
• 6000+
databases
under

management
• Over
400
DBA’s,
in
35
countries

• Top
5%
of
DBA
work
force,
9

Oracle
ACE’s,
2
Microsoft

MVP’s,
1
Cassandra
MVP

• Oracle,
Microsoft,
MySQL,

Datastax
partners,
Netezza,

Hadoop
and
MongoDB
plus

UNIX
Sysadmin
and
Oracle
apps
About Pythian

Where does René come from
– Oracle
DBA

• Started
with
Version
9.2
in
2004

– Speaker
at
Oracle
Open
World,

Developers
Day
and
Collaborate

– APress
Q1
2016:
“Prac%cal
Data

Refresh”

– Movie
Fanatic
&
Music
Lover

– Bringing
the
best
from
México

(Mexihtli)
to
the
rest
of
the
world

and
in
the
process
photographing
it
:)

– rene-‐ace.com

– @rene_ace
4

Where does Carlos come
5
• Cassandra
Consultant

• First
contact
was
0.8

• Cassandra
MVP
&
DataStax

Certified
Architect

• Lisbon
Cassandra
Meetup

• Passion
for
distributed
systems

• Loves
a
good
challenge

• Waterpolo
is
my
sport

• @cjrolo

6th Happiest Job of 2015!
7
http://www.forbes.com/sites/susanadams/2014/03/20/the-happiest-and-unhappiest-jobs-in-2014/
Work-life
balance
Relationship with
boss and co-workers
Daily tasks
Job resources
Field will grow by
15% between
2012 and 2022
DBA can be the
key driver of
success

Happiest Job of 2034?
Oxford University: THE FUTURE OF EMPLOYMENT: HOW SUSCEPTIBLE ARE JOBS TO COMPUTERISATION?
• 47
percent
of
American
jobs
are
at
high
risk
of
being
taken
by
computers

within
the
next
two
decades.

– 1st
Wave

• Computers
will
start
replacing
people
in
especially
vulnerable

fields
like
transportation/logistics,
production
labor,
and

administrative
support.

– 2nd
Wave

• Dependent
upon
the
development
of
good
artificial
intelligence.

This
could
next
put
jobs
in
management,
science
and

engineering,
and
the
arts
at
risk.
8

What is Cassandra ? 
• NoSQL
database,
developed
in
JavaOne

• Fully
distributed
DB

• Meaning
that
there
is
no
master
DB,

unlike
Oracle
or
MySQL.

• Linearly
scalable

• Based
on
2
core
technologies,
Google’s
Big

Table
and
Amazon’s
Dynamo

• 2
versions
of
Cassandra

• Community
Edition.-‐
This
is
distributed

under
the
Apache™
License

• Enterprise
Edition
.-‐
This
is
distributed
by

Datastax
9
≠

CAP
Theorem
• In
a
distributed
system
you
can
only
have
two

out
of
the
following
three
guarantees
across
a

write/read
pair:

• Consistency.-‐
A
read
is
guaranteed
to

return
the
most
recent
write
for
a
given

client.

• Availability.-‐A
non-‐failing
node
will
return

a
reasonable
response
within
a

reasonable
amount
of
time
(no
error
or

timeout).

• Partition
Tolerance.-‐The
system
will

continue
to
function
when
network

partitions
occur.
10
N1 N2
X X
N1 N2
N1 N2
What is Cassandra ?

What is Cassandra ? 
• Cassandra
is
a
BASE
(Basically

Available,
Soft
state,
Eventually

consistent)
type
system
11
• Not
an
ACID
(Atomicity,
Consistency,

Isolation,
Durability)
type
system

It Can be as easy as …
• Start
your
machine
and
install
the
following:

• ntp
(Packages
are
normally
ntp,
ntpdata
and
ntp-‐
doc)

• wget
(Unless
you
have
your
packages
copied
over
via

other
means)

• vim
(Or
your
favorite
text
editor)

• Yum
Package
Management

• Root
or
sudo
access
to
the
install
machine

• Latest
version
of
Oracle
Java
SE
Runtime

Environment
(JRE)
8
(recommended)
or
OpenJDK
7.

• Python
2.6+
(needed
if
installing
OpsCenter)
12

It Can be as easy as …
13
• Install
Cassandra.

~$ sudo yum install dsc21-2.1.5-1 cassandra2.1.5-1
• Install
optional
utilities.

~$ sudo yum install cassandra21-tools-2.1.5-1
• Start
Cassandra
service

~$ sudo service cassandra stop
~$ sudo rm -rf /var/lib/cassandra/data/system/*
• In
the
cassandra-‐rackdc.properties
file

#
indicate
the
rack
and
dc
for
this
node

dc=Pythian

rack=RAC1

~$ sudo service cassandra start

Where is everything in Cassandra?
14
Directories Description
/var/lib/cassandra Data
directories
/var/log/
cassandra Log
directory
/var/run/
cassandra Runtime
files
/usr/share/
cassandra Environment
settings
/usr/share/
cassandra/
lib
JAR
files
/usr/bin Optional
utilities,
such
as
sstablelevelreset,

sstablerepairedset,
and
sstablesplit
/usr/bin Binary
files
/usr/sbin
/etc/cassandra Configuration
files
/etc/init.d Service
startup
script
/etc/security/
limits.d Cassandra
user
limits
/etc/default
/usr/share/
doc/
cassandra/examples
Sample
cassandra.yaml
files
for
stress

testing

I come from this world…
12c
Version

Architecture…
15

Oracle…
16
101010
Online Redo
Log10100
Data Files Control Files
Segment
Database
Tablespace
Extent
Oracle data
block
Schema Data file
OS block
Logical
Datafile
Physical
Datafile

17
RAC
-‐
For
Node
Point
of
Failure
RAC Cluster
Node3Node2
ASM Disks
Node1
Public Network
Storage Network
ASM Network
CSS Network
ASM ASM ASM
DBB DBBDBB
Global
Data
Services

– Service Failover / Load Balancing

18
Dataguard
-‐
For
Failover
Primary
Standby
Far
Sync

Instance
SYNC
ASYNC
Zero
data
loss
failover

Cassandra Architecture
Cassandra
Cluster
19
N1
Node
N2
Node
Rack
1
Datacenter
México
N3
Node
N4
Node
Rack
2
Datacenter
Portugal

One Ring to Rule them All
20
• The
total
amount
of
data

managed
by
the
cluster
is

represented
as
a
ring

• Each
node
is
assigned
a
part
of

the
database
to
hold
based
on

each
table’s
primary
key.

• To
guarantee
both
availability

and
durability
multiple
nodes
will

be
assigned
to
the
same
data.

• There
is
no
master
node
all

nodes
can
perform
all
operations
1
4
3
2
A-F,T-Z,M-S
G-L,A-F,T-Z
M-S,G-L,A-F
T-Z,M-S,G-L

Gossip
21
• Peer-‐to-‐peer
communication

protocol
in
which
nodes
periodically

exchange
state
information

• Runs
every
second
and
exchanges

state
messages
with
up
to
three

other
nodes
in
the
cluster

• Failure
detection

• It
determines
locally
from

gossip
state
and
history
if

another
node
in
the
system
is

down
or
has
come
back
up.

Consistent Hashing
22
• A
hash
consists
of
one
or
more

arithmetic
operations
on
a
piece
of

data

• Common
way
of
load
balancing
across

several
nodes

• Hash
function
must
have
a
upper
and

lower
bound
so
objects
can
be

mapped
in
a
circle

• Common
Hash
algorithms

– Simple
checksums

– Message
Digest
(MD5)

– Secure
Hash
Algorithm
(SHA-‐1/2)

– MurmurHash

Partitioners
23
• Determines
how
data
is

distributed
across
the
nodes

in
the
cluster

• Function
for
deriving
a
token

representing
a
row
from
its

partition
key

Cassandra
Offers:

– Murmur3Partition

– RandomPartitioner

– ByteOrderedPartitioner

Virtual Nodes
24
• Solution
for
avoiding
calculating

node
tokens
and
thinking
about

the
cluster
size
before
hand

• Each
node
has
multiple
virtual

nodes

• Each
node
virtual
node
own
a

much
smaller
subset
of
data

Coordinators
25
• Acts
as
a
proxy
between
the

client
application
and
the

nodes
that
own
the
data

being
requestedAny
client

request
can
be
sent
to
any

node.

Snitch
26
• Is
responsible
for
keeping
all

of
the
nodes
up
to
date
on

what
node
has
what
data,

what
nodes
are
currently

down,
what
nodes
are

bootstrapping,
etc.

• It
Interprets
the
topology
The
most
popular
are:

– Gossiping
property
file

snitch

– EC2
Snitch

– EC2
Multi-‐region
snitch

– Dynamic
Snitch

Logical database container
28
Data
is
Stored
in
Keyspaces

A CASSANDRA TABLE OR COLUMN FAMILY
29
Coordinator
Snitch
Commitlog
Writer
Mem
table
writer
Mem
Table
Flush
(Sstable

writer)
Reader
Mem
tables
Bloom
Filters
Cassandra
Node
CommitLog
10100
SSTables

A CASSANDRA TABLE OR COLUMN FAMILY
30
• Consists
of
one
or
more
SStables
and

0
or
more
MEMtables

• SStable
stands
for
Sorted
String
Table.

• E.G.
all
of
the
Columns
in
the

SStable
are
sorted
in
order
by

key.

• Each
SStable
consists
of
the
data

table,
bloom
filter,
index
and
some

other
minor
files.

• SStables
are
immutable.
Once
written

they
are
never
altered
only
read
and

eventually
deleted
videogames-events-data-jb-1.db
videogames-events-filters-jb-1.db
videogames-events-index-jb-1.db
SStables
on
disk

/var/lib/cassandra

REPLICATION FACTOR (RF) AND CONSISTENCY
31
• Replication
Factor
is
the

number
of
copies
of

columns
stored
in
the
ring

• Replication
factor
should

not
exceed
the
number
of

nodes
in
the
cluster
– RF=1
is
one
copy
this
means
that

the
data
for
each
column
is
stored

only
once
in
the
ring.

– RF=3
(default)
means
every

column
stored
in
the
database
is

stored
three
times.

– Quorum
.-‐
The
read
and
write

must
be
acked/returned
from
a

quorum
of
nodes.

REPLICATION FACTOR (RF) AND CONSISTENCY
32
• Consistency

– When
write
or
read
is

performed
the
application
can

choose
to
wait
for
n
copies
of

the
data
to
be
written
or
read

this
is
referred
to
as
consistency

of
n.

– There
is
a
special
consistency

value
called
quorum
which

means
a
response
from
RF/2+1

nodes
is
required.

HOW TO MAKE SURE WE DON’T LOOSE DATA
33
• Three
anti-‐entropy
mechanisms
in
Cassandra

1)
Hinted
handoff

2)
Read
repair

3)
Repair
A.K.A.
Anti-‐Entropy

COMPACTIONS
35
• SStables
are
immutable.

• Deletes
and
updates
are
just
new

writes

• SStables
are
merged
together
by

partitioned
key.Old
obsolete
data
is

discarded.

• Lots
of
SStables
become
a
few.

• Compaction
can
require
a
lot
of

disk
space.
DO
NOT
LET
your
disks

get
more
than
50%
full.

CQL - Cassandra Query Language
36
CQL
is
not
SQL
• Default
and
primary
interface
into
the
Cassandra
Database
(since
2.0)

• Cassandra
does
not
support
joins
or
subqueries

• Only
way
to
create
users
and
user
based
permissions

• Very
similar:

cqlsh> CREATE KEYSPACE sandbox WITH REPLICATION = { 'class' :
'NetworkTopologyStrategy', DC1 : 1};
cqlsh> USE sandbox;
cqlsh:sandbox>CREATE TABLE data (id uuid, data text, PRIMARY KEY (id));
cqlsh:sandbox> INSERT INTO data (id, data) values
(c37d661d-7e61-49ea-96a5-68c34e83db3a, 'testing');
cqlsh:sandbox> SELECT * FROM data;

38
Feature/Function
DSE/Cassandra Oracle
RDBMS

Core architecture “Masterless”; peer-to-peer with
all nodes being the same
Traditional standalone
High availability Continuous availability with built
in redundancy and hardware
rack awareness in both single
and multiple data centers
Oracle Dataguard (for failover)
and Oracle RAC (Node SPOF)
GoldenGate
Data model Google Bigtable Relational/tabular
Data consistency model Tunable consistency (CAP
theorem consistency per
operation
Traditional ACID
Storage model Targeted directories with
separation
Tablespaces
Logical database
container
Keyspace Database
Backup/recovery Online, point-in-time restore Online, point-in-time restore
Enterprise management/
monitoring
DataStax OpsCenter Oracle Enterprise Manager

LESSONS LEARNED
39
• Understand
the
Data
Model
Differences

• Hardware
Setup
does
Matter

• Grep
the
logs
for
errors
and
warnings

• Make
sure
each
node
is
created
properly

• Know
your
tools

• nodetool
utility

• Cassandra
bulk
loader
(sstableloader)

• jconsole/JavaVisualVM

• Cassandra-‐Stress

• OpsCenter

FIT-ACER
• F – Focus (SLOW DOWN! Are you ready?)
• I – Identify server/DB name, time, authorization
• T – Type the command (do not hit enter yet)
• A – Assess the command (SPEND TIME HERE!)
• C – Check the server / database name again
• E – Execute the command
• R – Review and document the results
41

43
To contact us
sales@pythian.com
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
Thank you – Q&A

Pythian: My First 100 days with a Cassandra Cluster

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Pythian: My First 100 days with a Cassandra Cluster

Similar to Pythian: My First 100 days with a Cassandra Cluster (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

Pythian: My First 100 days with a Cassandra Cluster