4. History
• Started at Facebook
• Historically builds on
• Dynamo for distribution: consistent hashing, eventual consistency
• BigTable for disk storage model
Amazon’s Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Google’s BigTable: http://research.google.com/archive/bigtable.html
5. Cassandra is
• A distributed database written in Java
• Scalable
• Masterless, no single point of failure
• Tunable consistency
• Network topology aware
6. Cassandra Data Model
• Original “Map of Maps” schema
• row key ➞ Map<ColumnName, Value>
• Now (in CQL):
• Keyspace = Database
• ColumnFamily = Table
• Row = Partition
• Column = Cell
• Data types
• strings, booleans, integers, decimals
• collections: list, set, map
• not indexable, not individually query-
able
• counters
• custom types
7. Cassandra Replication Factor &
Consistency Levels
• CAP Theorem:
• Consistency
• Availability
• Tolerance in the face of network partitions
Original article: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
Review 12 years later: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed
Fun with distributed systems under partitions: http://aphyr.com/tags/jepsen
8. Cassandra Replication Factor &
Consistency Levels
• RF: designated per keyspace
• CL:
• Writes: ANY, ONE, QUORUM, ALL
• Reads: ONE, QUORUM, ALL
• Consistent reads & writes are
achieved when CL(W) + CL(R) > RF
• QUORUM = RF/2 + 1
• Additional QUORUM variants:
• LOCAL_QUORUM: quorum of
replica nodes within same DC
• EACH_QUORUM: quorum of
replica nodes from all DCs
Cassandra parameters calculator: http://www.ecyrd.com/cassandracalculator/
9. Masterless design
• All nodes in the cluster are equal
• Gossip protocol among servers
• Adding / removing nodes is easy
• Clients are cluster-aware
Traditional replicated relational database systems focus on the
problem of guaranteeing strong consistency to replicated data.
Although strong consistency provides the application writer a
convenient programming model, these systems are limited in
scalability and availability [7]. These systems are not capable of
A
B
C
DE
F
G
Key K
Nodes B, C
and D store
keys in
range (A,B)
including
K.
Figure 2: Partitioning and replication of keys in Dynamo
ring.
Image from “Dynamo: Amazon’s Highly Available Key-value Store”
10. Write path
• Storage is log-structured; updates do not overwrite, deletes do not remove
• Commit log: sequential disk access
• Memtables: in-memory data structure (partially off-heap since 2.1b2)
• Memtables are flushed to SSTable on disk
• Compaction: merge SSTables, remove tombstones
11. Read path
• For each SSTable that may contain a partition key:
• Bloom filters: estimate probability of locating partition data per SSTable
• Locate offset in SSTable
• Sequential read in SSTable (if query involves several columns)
• A partition’s columns are merged from several SSTables / memtable, as
column updates never overwrite data
13. CQL
• Cassandra Query Language
• Client API for Cassandra
• CQL3 available since Cassandra 1.2
• Familiar syntax
• Easy to use
• Drivers available for Java, Python, C# and more
15. Creating a table - what happened??
• A new table was created
• It looks familiar!
• We defined the username as the
primary key, therefore we are able
to identify a row and query quickly
by username
• Primary keys can be composite;
the first part of the primary key is
the partition key and determines
the primary node for the partition
22. Insert/Update
• INSERT & UPDATE are functionally equivalent
• New in Cassandra 2.0: Support for lightweight transactions (compare-and-
set)
• e.g. INSERT INTO users (username, email) VALUES (‘tony’,
‘tony@gmail.com’) IF NOT EXISTS;
• Based on Paxos consensus protocol
Paxos Made Live: An Engineering Perspective: http://research.google.com/archive/paxos_made_live.pdf
23. Select query
• SELECT * FROM user_attributes;
• Selecting across several partitions can be slow
• Default LIMIT 10.000
• Can filter results with WHERE clauses on partition key, partition key & clustering
columns or indexed columns
• EQ & IN operators allowed for partition keys
• EQ, <, > … operators allowed for clustering columns
24. Select query - Ordering
• Partition keys are not ordered
• … but clustering columns are ordered
• Default ordering is mandated by clustering columns
• ORDER BY can be specified on clustering columns at query time; default
order can be set WITH CLUSTERING ORDER on table creation
25. Secondary Indexes
• Secondary indexes allow queries using EQ or IN operators in columns other
than the partition key
• Internally implemented as hidden tables
• “Cassandra's built-in indexes are best on a table having many rows that
contain the indexed value. The more unique values that exist in a particular
column, the more overhead you will have, on average, to query and maintain
the index.”
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_when_use_index_c.html
27. Query Performance
• Single-partition queries are fast!
• Queries for ranges on clustering columns are fast!
• Queries for multiple partitions are slow
• Use secondary indexes with caution
33. Data access patterns
• View poll ➞ Get poll name & sorted list of answers by poll id
• User votes ➞ Insert answer with user id, poll id, answer id, timestamp
• View result ➞ Retrieve counts per poll & answer
34. Poll & answers
POLL_ID TEXT
POLL_ID ANSWER_ID SORT_ORDER
POLL
POLL_ANSWER
ANSWER_ID TEXT
ANSWER
35. Poll & answers
• Need 3 queries to display a poll
• 2 by PK EQ
• 1 for multiple rows by PK IN
45. Links
• http://cassandra.apache.org
• http://planetcassandra.org/
Cassandra binary distributions, use cases, webinars
• http://www.datastax.com/docs
Excellent documentation for all things Cassandra (and DSE)
• http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries
Cassandra 2.0 new features & time series modeling