• Save
Cassandra internals
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Cassandra internals

on

  • 4,115 views

 

Statistics

Views

Total Views
4,115
Views on SlideShare
2,661
Embed Views
1,454

Actions

Likes
15
Downloads
0
Comments
0

11 Embeds 1,454

http://blog.nosqlfan.com 1405
http://feed.feedsky.com 28
http://xianguo.com 10
http://zhuaxia.com 3
http://reader.youdao.com 2
http://cache.baidu.com 1
http://www.zhuaxia.com 1
https://twitter.com 1
http://static.slidesharecdn.com 1
http://127.0.0.1 1
http://xue.uplook.cn 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Cassandra internals Presentation Transcript

  • 1. Cassandra under the hood Richard Low rlow@acunu.com
  • 2. Outline• What happens when you write? • Commit logs • Memtables “richard”:{ “email”:”rlow@acunu.com” • SSTables } ?• What happens when you read? • Point queries • Range queries• Repair and snapshots
  • 3. Why should we care?• Help understand performance• Understand performance implications of data model• Helps to fix it if something goes wrong• Interesting!
  • 4. Writes
  • 5. Writes (2) InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
  • 6. Commit log InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
  • 7. Commit log• Each insert written to commit log first• Stored in insertion order• Inserts not acknowledged until written to commit log• Batch vs periodic• In case of crash, can replay
  • 8. Memtable Memtable
  • 9. Memtable• In memory store of insertions• ConcurrentSkipListMap• When too large, flushed to disk• Ensures all writes to disk are sequential
  • 10. SSTable InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
  • 11. SSTables• Stores actual data, sorted by key• Contains a Bloom filter and index to help find keys• Read only
  • 12. Bloom filters• Probabilistic data structure• Answers membership queries: • ‘Does the set contain x?’• Can give false positives, never false negatives• Space efficient• Typical size: 1 byte per key
  • 13. How it works together Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3..........................
  • 14. How it works together Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
  • 15. How it works together Memory Disk Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
  • 16. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 17. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 18. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter2. Find location k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 19. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter2. Find location3. Read data k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 20. Range queries• Bloom filters useless• Use index to locate portion of SSTable• Read data, merge results• Necessary to lookup in every SSTable data file• Disk I/O proportional to #SSTables
  • 21. Compaction• Merges SSTables• Removes overwrites and obsolete tombstones• Improves range query performance• Major compaction creates one SSTable
  • 22. Write optimised• All writes are sequential on disk• Each write is written multiple times during compactions• Bloom filters mean approx. one I/O per read• Avoid a read-modify-write data model
  • 23. Scaling• In memory: • Buffers • Memtables • Bloom filters • Index• If not enough memory, significant performance impact
  • 24. Repair: Merkle Trees• Repair builds a Merkle tree• Compared with replicas• Efficient• If differences are found, portions of SSTables are streamed• Requires full disk scan to build
  • 25. Snapshot• For backup, want consistent set of SSTables• nodetool snapshot does this• Creates hard links to existing SSTables• Implies data will be copied after a few compactions
  • 26. Summary• How writes end up on disk• How point queries and range queries find the data• Implications• Repair• Snapshot