Apache cassandra - future without boundaries (part2)

August 6, 2015 www.ExigenServices.com
Apache Cassandra – Future without
Boundaries

2 www.ExigenServices.com
Data model
[Keyspace][ColumnFamily][RowKey][ColumnName]
Keyspace
Column Family
RowKey1
RowKey2
Column1 Column2 Column3
Value3Value2Value1
Value4Value1
Column4Column1

IV. Data model example -
Twissandra

Twissandra Use Cases
 Get the friends of a username
 Get the followers of a username
 Get a timeline of a specific user’s tweets
 Create a tweet
 Create a user
 Add friends to a user

Twissandra – DB User
User
id
user_name
password

Twissandra - DB Followers
User
id
user_name
password
User
id
user_name
passwordFollowers
user_id
follower_id

Twissandra - DB Following
User
id
user_name
password
User
id
user_name
passwordFollowing
user_id
following_id

Twissandra – DB Tweets
User
id
user_name
password
Tweet
id
user_id
body
timestamp

Twissandra column families
 User
 Username
 Friends, Followers
 Tweet
 Userline
 Timeline

Twissandra – Users CF
<<CF>> User
<<RowKey>> userid
+ username
+ password
<<CF>> Username
<<RowKey>> username
+ userid

Twissandra–Friends and Followers CFs
<<CF>> Friends
<<RowKey>> userid
<<CF>> Followers
<<RowKey>> userid
friendid
timestamp
followerid
timestamp

Twissandra – Tweet CF
<<CF>> Tweet
<<RowKey>> tweetid
+ userid
+ body
+ timestamp

Twissandra–Userline and Timeline CFs
<<CF>> Userline
<<RowKey>> userid
<<CF>> Timeline
<<RowKey>> userid
timestamp
tweetid
timestamp
tweetid

Cassandra QL – User creation
BEGIN BATCH
INSERT INTO User (KEY, username, password) VALUES (‘id',
‘konstantin’, ‘******’)
INSERT INTO Username (KEY, userid) VALUES ( ‘konstantin’, ‘id’)
APPLY BATCH

Cassandra QL – following a friend
BEGIN BATCH
INSERT INTO Friends (KEY, friendid) VALUES (userid, 123456)
INSERT INTO Followers (KEY, userid) VALUES (friendid ‘, 123456)
APPLY BATCH

Cassandra QL – Tweet creation
 BEGIN BATCH
 INSERT INTO Tweet (KEY, userid, body, timestamp) VALUES
(‘tweetid‘, ‘userid’, ’@ericflo thanks for Twissandra, it helps!’,
123656459847)
 INSERT INTO Userline (KEY, 123656459847) VALUES (
‘userid’, ‘tweetid’)
 INSERT INTO Timeline (KEY, 123656459847) VALUES (
‘userid’, ‘tweetid’)
 ……..
 INSERT INTO Timeline (KEY, 123656459847) VALUES (
‘followerid’, ‘tweetid’)
 ……
 APPLY BATCH

Cassandra QL – Getting user tweets
SELECT * FROM Userline WHERE KEY = ‘userid’
SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’,
‘tweetid3’, …., ‘tweetidn’)

Cassandra QL – Getting user timeline
SELECT * FROM Timeline WHERE KEY = ‘userid’
SELECT * FROM Tweet WHERE KEY IN (‘tweetid1’, ‘tweetid2’,
‘tweetid3’, …., ‘tweetidn’)

Design patterns
 Materialized View
– create a second column family to represent
additional queries
 Valueless Column
– use column names for values
 Aggregate Key
– If you need to find sub item, use composite key

V. Architecture

Partitioners
Partitioners decide where a key maps onto the ring.
Key 1
Key 2
Key 3
Key 4

Partitioners
 RandomPartitioner
 OrderPreservingPartitioner
 ByteOrderedPartitioner
 CollatingOrderPreservingPartitioner

Replication
 Replication controlled by the replication_factor
setting in the keyspace definition
 The actual placement of replicas in the cluster is
determined by the Replica Placement Strategies.

Placement Strategies
 SimpleStrategy - returns the nodes that are next
to each other on the ring.

 OldNetworkTopologyStrategy - places one replica
in a different data center while placing the others
on different racks in the current data center.

 NetworkTopologyStrategy - allows you to
configure the number of replicas per data center
as specified in the strategy_options.

Snitches
Give Cassandra information about the network
topology of the cluster
 Endpoint snitch – gives information about network
topology.
 Dynamic snitch – monitor read latencies

Endpoint Snitch Implementations
 SimpleSnitch (default) - can be efficient for
locating nodes in clusters limited to a single data
center.

 RackInferringSnitch - extrapolates the topolology
of the network by analyzing IP addresses.
192.168.191.71
192.168.191.21
In the same rack
192.168.191.71
192.168.171.21
In the same datacenter
192.78.19.71
192.18.11.21
In different datacenters

 PropertyFileSnitch - determines the location of
nodes by referring to a user-defined description of
the network details located in the property file
cassandra-topology.properties.

Commit Log
• Durability
• sequential writes only
Memtable
• no disk access, batched writes
SSTable
• become read‐only
• indexes
Memtables, SSTables, Commit Logs

Write properties
Write properties
 No reads
 No seeks
 Fast
 Atomic within single row
 Always writable

Read properties
Read properties
 Read multiple SSTables
 Slower than writes (but still fast)
 Seeks can be mitigated with more RAM
 Amortized lose of scalability

Commit Log durability
 Periodic sync of commit log. With potential
probability for data loss.
 Batch sync of commit log. Write is acknowledged
only if commit log is flushed on disk. It is strongly
recommended to have separate device for commit
log in such case.

Gossip protocol
 Intra-ring communication
 Runs periodically
 Failure detection, hinted handoffs and nodes
exchange

Gossip protocol
 org.apache.cassandra.gms.Gossiper
– Has the list of nodes that are alive and dead
– Chooses a random node and starts “chat” with
it. One gossip round requires three messages
 Failure detection uses a suspicion level to decide
whether the node is alive or dead

Hinted handoff
 Cassandra is always available for write

Consistency level
Consistency level Write Read
ANY 1 replica (including HH) -
ONE 1 1
QUORUM N/2 + 1 N/2 + 1
LOCAL_QUORUM
(to avoid latency issues)
(dc_replicas)/2 + 1 (local
datacenter)
(dc_replicas)/2 + 1 (local
datacenter)
EACH_QUORUM
(useful in backup scenarios)
(dc_replicas)/2 + 1 (each
datacenter)
(dc_replicas)/2 + 1 (each
datacenter)
ALL N N

Tombstones
 The data is not immediately deleted
 Deleted values are marked
 Tombstones will be suppressed during next
compaction
 GCGraceSeconds – amount of seconds that
server will wait to garbage-collect a tombstone

Compaction
 Merging SSTables into one
– merging keys
– combining columns
– creating new index
Main aims:
 Free up space
 Reduce number of required seeks

Compaction
 Minor:
– Triggered when at least N SSTables have been
flushed on disk (N is tunable, 4 – by default)
– Merging SSTables of the similar size
 Major:
– Merging all SSTables
– Done manually through nodetool compact
– discarding tombstones

Replica synchronization
 Anti-entropy
 Read repair

Anti-entropy
 During major compaction the node exchanges
Merkle trees (hash of its data) with another nodes
 If the trees don’t match, they are repaired
 Nodes maintain timestamp index and exchange
only the most recent updates

Read repair
 During read operation replicas with stale values
are brought up to date
– Week consistency level (ONE):
after the data is returned
– Strong consistency level (ALL):
before the data is returned
– Eventual consistency - QUORUM

Bloom filters
 A bit array
 Test whether value is a member of set
 Reduce disk access (improve performance)

Bloom filters
 On write:
– several hashes are generated per key
– bits for each hash are marked
 On read:
– hashes are generated for the key
– if all bits of this hashes are non-empty then the
key may probably exist in SSTable
– if at least one bit is empty then the key has
been never written to SSTable

Key1
Hash3
Hash2
Hash1
1
0
0
0
0
0
0
1
0
0
1
Key2
Hash3
Hash2
Hash1
ReadWrite
SSTable
Bloom filters

Apache cassandra - future without boundaries (part2)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Apache cassandra - future without boundaries (part2)

Similar to Apache cassandra - future without boundaries (part2) (20)

More from Return on Intelligence

More from Return on Intelligence (20)

Recently uploaded

Recently uploaded (20)

Apache cassandra - future without boundaries (part2)

Editor's Notes