Cassandra presentation

665 views

Published on

Introduction to Cassandra

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
665
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cassandra presentation

  1. 1. Cassandra How Stuff Works Sergey Enin (sergey_enin@epam.com)
  2. 2. AGENDA Agenda Introduction Architecture Partitioning & Replication Data management Data model 2
  3. 3. Introduction 3
  4. 4. INTRODUCTION: SELECTED CASES Selected Cases Who use Cassandra? eBay has Cassandra supporting multiple applications (Social Signals, Hunch, and many time series use cases) with clusters spanning several data centers. Netflix is using Cassandra on AWS as a key infrastructure component of its globally distributed streaming product. Shazam uses Cassandra cluster to power their recommendations system. and many others… Check - http://www.datastax.com/cassandrausers 4
  5. 5. INTRODUCTION: MOST ADVANTAGES Most advantages Most advantages of Cassandra are: • Fast writes. • Tunable consistency. • Decentralization. • Integration with Hadoop. 5
  6. 6. Architecture 6
  7. 7. ARCHITECTURE: FAST WRITES Fast writes Cassandra is very fast on writes, cause of use of Log-structured merge tree. Process of inserting new record into Cassandra 7
  8. 8. ARCHITECTURE: FAST WRITE How LSM-tree is done: Memtables and SSTables 2 1 3 1 Commit log – all data is written to the commit log for durability. 2 SSTables are immutable. A row is typically stored across multiple SSTable files. 3 Each SSTable has a bloom filter associated with it. The bloom filter is used to check if a requested row key exists in the SSTable before doing any disk seeks. 4 Deleted data is not immediately removed from disk. A deleted column can reappear. Tombstones. 8
  9. 9. ARCHITECTURE: NETWORK ARCHITECTURE Network architecture • All nodes – are peers (no master). • Client specify set of Cassandra nodes and get connected to first live node. • Nodes are using gossip protocol. 9
  10. 10. Partitioning & replication 10
  11. 11. PARTITIONING & REPLICATION: DATA PARTITIONING Data partitioning Partitioner – determines, where first replica would live in the ring. • RandomPartitioner – default strategy, provides ±same load of all nodes. • ByteOrderedPartitioner - orders rows lexically by key bytes, allows range scans, not recommended. 11
  12. 12. PARTITIONING & REPLICATION: REPLICATION Replication Replication = replication factor + replica placement strategy Replica placement strategy: SimpleStrategy: • default strategy; • not taking network topology into account; NetworkTopology Strategy: • preferred, when you have information about network map of your nodes; 12
  13. 13. Data management 13
  14. 14. DATA MANAGEMENT: DATA ACCESSING Data accessing READ + WRITES: • Tunable consistency. Consistency level specify how many nodes should answer for read/write request(but writes goes to all replicas). • Batches - sets a global consistency level and client-supplied timestamp for all columns written by the statements in the batch. 14
  15. 15. DATA MANAGEMENT: ACID ACID ACID • Atomicity – writes are atomic at row level. • Consistency – tunable consistency. • Isolation – writes are invisible until they are complete. • Durability – writes are durable. • Read-repair, anti-entropy node repair, hinted handoff. 15
  16. 16. Data model 16
  17. 17. DATA MODEL: CASSANDRA`S DATA MODEL Cassandra`s data model Relational databases – you design schema, based on entities and relationships. Cassandra – you design schema, based on what queries you would like to perform. 17
  18. 18. DATA MODEL: INDEXES Indexes An index is a data structure that allows for fast, efficient lookup of data matching a given condition. Primary key – the unique key used to identify each row in a table. Secondary indexes – refer to indexes on column values. 18
  19. 19. DATA MODEL: CQL3 CQL3 cqlsh> INSERT INTO users (user_name, password) VALUES ('jsmith', 'ch@ngem3a'); cqlsh> SELECT * FROM users WHERE user_name='jsmith'; user_name | password | state -----------+-----------+------jsmith | ch@ngem3a | null Confidential 19
  20. 20. THANK YOU! Thank you! 20

×