Cassandra    Rob KeislerCSCI 638 -- Summer 2011
What is Cassandra?● A distributed storage system with a flexible schema and  high-write throughput● Developed by Facebook;...
Cassandras Infrastructure
Cassandras Data Model● Rows (keyspace)● Column Families ● Columns and Super Columns   ○ User can specify sorting by name o...
Cassandras Data Model (in JSON)● Key > Column Family > Column     {        "keyA":{           "Users":{              "emai...
Cassandras Data Model (in JSON)● Key > Column Family > Super Column > Column     {      "KeyA": {        "Tags": {        ...
Differences from Dynamo● Partitioning   ○ Dynamo distributes virtual nodes on the hash ring using     the performance of t...
Differences from Dynamo● Failure Detection   ○ Dynamo uses a gossip-based protocol for membership     changes; a node is a...
Differences from BigTable● Data Model   ○ BigTable stores <K,V> pairs in SSTables by Column     Family with historical ver...
Cassandrahttp://cassandra.apache.org/
Upcoming SlideShare
Loading in...5
×

Cassandra

4,377

Published on

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,377
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
106
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra

  1. 1. Cassandra Rob KeislerCSCI 638 -- Summer 2011
  2. 2. What is Cassandra?● A distributed storage system with a flexible schema and high-write throughput● Developed by Facebook; turned over to Apache● At its core, Cassandra borrows from both: ○ Amazons Dynamo Infrastructure ○ Googles BigTable Data Model
  3. 3. Cassandras Infrastructure
  4. 4. Cassandras Data Model● Rows (keyspace)● Column Families ● Columns and Super Columns ○ User can specify sorting by name or timestamp Column SuperColumn KeyA ColumnA ColumnB ColumnC Byte [] Name Byte [] Name KeyB ColumnX ColumnY Column Z Byte [] Value List<Column> Columns Int64 Timestamp KeyA SuperColumnI SuperColumnJ KeyB SuperColumnM SuperColumnN
  5. 5. Cassandras Data Model (in JSON)● Key > Column Family > Column  { "keyA":{ "Users":{ "emailAddress":{"timestamp":"1", "value":"foo@bar.com"}, "webSite":{"timestamp":"4", "value":"http://bar.com"} }, "Stats":{ "visits":{"timestamp":"3", "value":"243"} } }, "keyB":{ "Users":{ "emailAddress":{"timestamp":"1", "value":"user2@bar.com"}, "twitter":{"timestamp":"4", "value":"user2"} } } }
  6. 6. Cassandras Data Model (in JSON)● Key > Column Family > Super Column > Column  {   "KeyA": {     "Tags": {       "cassandra": {         "incubator": {"timestamp": "http://incubator.apache.org/cassandra/"},         "jira": {"timestamp": "http://issues.apache.org/jira/browse/CASSANDRA"}       },       "thrift": {         "jira": {"timestamp": "http://issues.apache.org/jira/browse/THRIFT"}    }   }  } }
  7. 7. Differences from Dynamo● Partitioning ○ Dynamo distributes virtual nodes on the hash ring using the performance of the host node ○ Cassandra distributes host nodes by examining load information on the hash ring and moving lightly loaded nodes to alleviate those with high load● Replication ○ "Rack Unaware" ○ "Rack Aware" ○ "Datacenter Aware"
  8. 8. Differences from Dynamo● Failure Detection ○ Dynamo uses a gossip-based protocol for membership changes; a node is assumed failed if it does not respond ○ Cassandra uses the same gossip-based protocol but uses a φ (phi) Accrual Failure Detector ■ Does not emit a boolean up or down ■ Emits a value which represents a suspicion level ■ The suspicion threshold is dynamically adjusted via the gossip messages ■ Sliding windows determined by arrival times  ■ Statistical distribution model created
  9. 9. Differences from BigTable● Data Model ○ BigTable stores <K,V> pairs in SSTables by Column Family with historical versions ○ Cassandra drops historical versions and adds the super column concept● Storage ○ BigTable uses the Google File System (GFS) ○ Cassandra uses the local file system
  10. 10. Cassandrahttp://cassandra.apache.org/
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×