Building a Key-Value
Store with Cassandra
Kiwi PyCon 2010
Aaron Morton @aaronmorton
Weta Digital
1
Why Cassandra?
• Part of a larger project started earlier
this year to build new systems for code
running on the render fa...
How about MySQL ?
• It works. But...
• Schema changes
• Write redundancy
• Query language mismatch
• So went looking for t...
Redis ?
• Fast, flexible. But...
• Single core limit
• Replication, but no cluster (itʼs
coming)
• Limited support options
4
Couch DB ?
• Schema free, scalable (sort of),
redundant (sort of). But...
• Single write thread limit
• Replication, but n...
Cassandra ?
• Just right, perhaps. Letʼs see...
• Highly available
• Tuneable synchronous replication
• Scalable writes an...
Availability
• Row data is kept together and
replicated around the cluster
• Replication Factor is configurable
• Partition...
Consistency
• Each read or write request specifies a
Consistency Level
• Individual nodes may be inconsistent with
respect ...
Consistency
• R + W > N
• R = Read Consistency
• W = Write Consistency
• N = Replication Factor
9
Scale
• Distributed hash table
• Scale throughput and capacity with
more nodes, more disk, more memory
• Adding or removin...
Data Model
• Column orientated
• Denormalise
• Cassandra in an index building
machine
• Simple explanation: a row has a ke...
Data Model
• Keyspace
• Row / Key
• Column Family or Super Column
Family
• Column
12
Data Model
User CF Posts SCF
Fred
email:fred@...
dob:04/03
post_1:{
title: foo,
body: bar}
Bob email:bob
post_100:{
title:...
API
• Thrift
• Avro (beta)
• Auto generated bindings for many
languages
• Stateful connections
• Python wrappers pycassa, ...
API
• Client supplied time stamp for all
mutations
• Client supplied Consistency Level for
all mutations and reads
15
API
• insert (key, column_family,
super_column, column, value)
• get(key, column_family,
super_column, column)
• remove(ke...
API
• Slicing columns or super columns
• list of names
• start, finish, count, reversed
• get_slice() to slice one row
• mu...
API
• Slicing keys
• start key, finish key, count
• Partitioner effects key order
• get_range_slices() to slice rows and
co...
API
• batch_mutate()
• multiple rows and CFʼs
• delete or insert / update
• Individual mutations are atomic
• Request is n...
Our Application
Varnish
Nginx
Tornado
Cassandra Rabbit MQ
20
Our Application
• Similar to Amazon S3.
• REST API.
• Databases, Buckets, Keys+Values.
21
Our Column Families
• Database (super)
• Bucket (super)
• Bucket Index
• Object
• Object Index (super)
22
Our API
http:// db_name.wetafx.co.nz/bucket/key
23
PUT Object
• /bucket/object
• batch_mutate()
• one row in Objects CF with columns
for meta and the body
• one column in Ob...
List Objects
• /bucket_name?start=foo
• get_slice()
• for the bucket row in ObjectIndex
CF
• if needed, multiget_slice() t...
Delete Bucket
• /bucket_name
• get_slice() on ObjectIndex CF
• batch_mutate() to delete Object CF
and ObjectIndex CF
• del...
Thanks
• http://wetafx.co.nz
• http://cassandra.apache.org/
•
27
Upcoming SlideShare
Loading in...5
×

Building a distributed Key-Value store with Cassandra

5,517

Published on

Slides from my talk at Kiwi Pycon in 2010.

Covers why we chose Cassandra, overview of it's feature and data model, and how we implemented our application.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
No Downloads
Views
Total Views
5,517
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
52
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Building a distributed Key-Value store with Cassandra

  1. 1. Building a Key-Value Store with Cassandra Kiwi PyCon 2010 Aaron Morton @aaronmorton Weta Digital 1
  2. 2. Why Cassandra? • Part of a larger project started earlier this year to build new systems for code running on the render farm of 35,000 cores • Larger project goals were Scalability, Reliability, Flexible Schema 2
  3. 3. How about MySQL ? • It works. But... • Schema changes • Write redundancy • Query language mismatch • So went looking for the right tool for the job 3
  4. 4. Redis ? • Fast, flexible. But... • Single core limit • Replication, but no cluster (itʼs coming) • Limited support options 4
  5. 5. Couch DB ? • Schema free, scalable (sort of), redundant (sort of). But... • Single write thread limit • Replication, but no cluster (itʼs coming) • Low consistency with asynchronous replication 5
  6. 6. Cassandra ? • Just right, perhaps. Letʼs see... • Highly available • Tuneable synchronous replication • Scalable writes and reads • Schema free (sort of) • Lots of new mistakes to be made 6
  7. 7. Availability • Row data is kept together and replicated around the cluster • Replication Factor is configurable • Partitioner determines the position of a row key in the distributed hash table • Replication Strategy determines where in the cluster to place the replicas 7
  8. 8. Consistency • Each read or write request specifies a Consistency Level • Individual nodes may be inconsistent with respect to others • Reads may give consistent results while some nodes have inconsistent values • The entire cluster will eventually mode to a state where there is one version of each 8
  9. 9. Consistency • R + W > N • R = Read Consistency • W = Write Consistency • N = Replication Factor 9
  10. 10. Scale • Distributed hash table • Scale throughput and capacity with more nodes, more disk, more memory • Adding or removing nodes is an online operation • Gossip based protocol for discovery 10
  11. 11. Data Model • Column orientated • Denormalise • Cassandra in an index building machine • Simple explanation: a row has a key and stores an ordered hash in one or more Column Families 11
  12. 12. Data Model • Keyspace • Row / Key • Column Family or Super Column Family • Column 12
  13. 13. Data Model User CF Posts SCF Fred email:fred@... dob:04/03 post_1:{ title: foo, body: bar} Bob email:bob post_100:{ title: monkeys, body: naughty} 13
  14. 14. API • Thrift • Avro (beta) • Auto generated bindings for many languages • Stateful connections • Python wrappers pycassa, Telephus (twisted) 14
  15. 15. API • Client supplied time stamp for all mutations • Client supplied Consistency Level for all mutations and reads 15
  16. 16. API • insert (key, column_family, super_column, column, value) • get(key, column_family, super_column, column) • remove(key, column_family, super_column, column) 16
  17. 17. API • Slicing columns or super columns • list of names • start, finish, count, reversed • get_slice() to slice one row • multiget_slice() to slice multiple rows • get_range_slices() to slice rows and columns 17
  18. 18. API • Slicing keys • start key, finish key, count • Partitioner effects key order • get_range_slices() to slice rows and columns 18
  19. 19. API • batch_mutate() • multiple rows and CFʼs • delete or insert / update • Individual mutations are atomic • Request is not atomic, no rollback 19
  20. 20. Our Application Varnish Nginx Tornado Cassandra Rabbit MQ 20
  21. 21. Our Application • Similar to Amazon S3. • REST API. • Databases, Buckets, Keys+Values. 21
  22. 22. Our Column Families • Database (super) • Bucket (super) • Bucket Index • Object • Object Index (super) 22
  23. 23. Our API http:// db_name.wetafx.co.nz/bucket/key 23
  24. 24. PUT Object • /bucket/object • batch_mutate() • one row in Objects CF with columns for meta and the body • one column in ObjectIndex CF row for the bucket 24
  25. 25. List Objects • /bucket_name?start=foo • get_slice() • for the bucket row in ObjectIndex CF • if needed, multiget_slice() to “join” to the Object CF 25
  26. 26. Delete Bucket • /bucket_name • get_slice() on ObjectIndex CF • batch_mutate() to delete Object CF and ObjectIndex CF • delete Bucket CF row 26
  27. 27. Thanks • http://wetafx.co.nz • http://cassandra.apache.org/ • 27
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×