Cassandra
How Stuff Works
Sergey Enin
(sergey_enin@epam.com)
AGENDA

Agenda

Introduction
Architecture

Partitioning & Replication
Data management
Data model
2
Introduction

3
INTRODUCTION: SELECTED CASES

Selected Cases
Who use Cassandra?
eBay has Cassandra supporting multiple
applications (Socia...
INTRODUCTION: MOST ADVANTAGES

Most advantages
Most advantages of Cassandra are:
• Fast writes.
• Tunable consistency.
• D...
Architecture

6
ARCHITECTURE: FAST WRITES

Fast writes
Cassandra is very fast on writes, cause of
use of Log-structured merge tree.

Proce...
ARCHITECTURE: FAST WRITE

How LSM-tree is done: Memtables and SSTables
2
1

3

1

Commit log – all data is written to the
...
ARCHITECTURE: NETWORK ARCHITECTURE

Network architecture

• All nodes – are peers
(no master).
• Client specify set of Cas...
Partitioning & replication

10
PARTITIONING & REPLICATION: DATA PARTITIONING

Data partitioning

Partitioner – determines, where first replica would live...
PARTITIONING & REPLICATION: REPLICATION

Replication
Replication = replication factor
+ replica placement strategy
Replica...
Data management

13
DATA MANAGEMENT: DATA ACCESSING

Data accessing
READ + WRITES:

• Tunable consistency. Consistency level specify
how many ...
DATA MANAGEMENT: ACID

ACID
ACID

• Atomicity – writes are atomic at row level.
• Consistency – tunable consistency.
• Iso...
Data model

16
DATA MODEL: CASSANDRA`S DATA MODEL

Cassandra`s data model
Relational databases – you design
schema, based on entities and...
DATA MODEL: INDEXES

Indexes
An index is a data structure that allows for
fast, efficient lookup of data matching a given
...
DATA MODEL: CQL3

CQL3
cqlsh> INSERT INTO users
(user_name, password)
VALUES ('jsmith', 'ch@ngem3a');
cqlsh> SELECT * FROM...
THANK YOU!

Thank you!

20
Upcoming SlideShare
Loading in …5
×

Cassandra presentation

633 views

Published on

Introduction to Cassandra

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
633
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cassandra presentation

  1. 1. Cassandra How Stuff Works Sergey Enin (sergey_enin@epam.com)
  2. 2. AGENDA Agenda Introduction Architecture Partitioning & Replication Data management Data model 2
  3. 3. Introduction 3
  4. 4. INTRODUCTION: SELECTED CASES Selected Cases Who use Cassandra? eBay has Cassandra supporting multiple applications (Social Signals, Hunch, and many time series use cases) with clusters spanning several data centers. Netflix is using Cassandra on AWS as a key infrastructure component of its globally distributed streaming product. Shazam uses Cassandra cluster to power their recommendations system. and many others… Check - http://www.datastax.com/cassandrausers 4
  5. 5. INTRODUCTION: MOST ADVANTAGES Most advantages Most advantages of Cassandra are: • Fast writes. • Tunable consistency. • Decentralization. • Integration with Hadoop. 5
  6. 6. Architecture 6
  7. 7. ARCHITECTURE: FAST WRITES Fast writes Cassandra is very fast on writes, cause of use of Log-structured merge tree. Process of inserting new record into Cassandra 7
  8. 8. ARCHITECTURE: FAST WRITE How LSM-tree is done: Memtables and SSTables 2 1 3 1 Commit log – all data is written to the commit log for durability. 2 SSTables are immutable. A row is typically stored across multiple SSTable files. 3 Each SSTable has a bloom filter associated with it. The bloom filter is used to check if a requested row key exists in the SSTable before doing any disk seeks. 4 Deleted data is not immediately removed from disk. A deleted column can reappear. Tombstones. 8
  9. 9. ARCHITECTURE: NETWORK ARCHITECTURE Network architecture • All nodes – are peers (no master). • Client specify set of Cassandra nodes and get connected to first live node. • Nodes are using gossip protocol. 9
  10. 10. Partitioning & replication 10
  11. 11. PARTITIONING & REPLICATION: DATA PARTITIONING Data partitioning Partitioner – determines, where first replica would live in the ring. • RandomPartitioner – default strategy, provides ±same load of all nodes. • ByteOrderedPartitioner - orders rows lexically by key bytes, allows range scans, not recommended. 11
  12. 12. PARTITIONING & REPLICATION: REPLICATION Replication Replication = replication factor + replica placement strategy Replica placement strategy: SimpleStrategy: • default strategy; • not taking network topology into account; NetworkTopology Strategy: • preferred, when you have information about network map of your nodes; 12
  13. 13. Data management 13
  14. 14. DATA MANAGEMENT: DATA ACCESSING Data accessing READ + WRITES: • Tunable consistency. Consistency level specify how many nodes should answer for read/write request(but writes goes to all replicas). • Batches - sets a global consistency level and client-supplied timestamp for all columns written by the statements in the batch. 14
  15. 15. DATA MANAGEMENT: ACID ACID ACID • Atomicity – writes are atomic at row level. • Consistency – tunable consistency. • Isolation – writes are invisible until they are complete. • Durability – writes are durable. • Read-repair, anti-entropy node repair, hinted handoff. 15
  16. 16. Data model 16
  17. 17. DATA MODEL: CASSANDRA`S DATA MODEL Cassandra`s data model Relational databases – you design schema, based on entities and relationships. Cassandra – you design schema, based on what queries you would like to perform. 17
  18. 18. DATA MODEL: INDEXES Indexes An index is a data structure that allows for fast, efficient lookup of data matching a given condition. Primary key – the unique key used to identify each row in a table. Secondary indexes – refer to indexes on column values. 18
  19. 19. DATA MODEL: CQL3 CQL3 cqlsh> INSERT INTO users (user_name, password) VALUES ('jsmith', 'ch@ngem3a'); cqlsh> SELECT * FROM users WHERE user_name='jsmith'; user_name | password | state -----------+-----------+------jsmith | ch@ngem3a | null Confidential 19
  20. 20. THANK YOU! Thank you! 20

×