Your SlideShare is downloading. ×
  • Like
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply


Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Cassandra Rob KeislerCSCI 638 -- Summer 2011
  • 2. What is Cassandra?● A distributed storage system with a flexible schema and high-write throughput● Developed by Facebook; turned over to Apache● At its core, Cassandra borrows from both: ○ Amazons Dynamo Infrastructure ○ Googles BigTable Data Model
  • 3. Cassandras Infrastructure
  • 4. Cassandras Data Model● Rows (keyspace)● Column Families ● Columns and Super Columns ○ User can specify sorting by name or timestamp Column SuperColumn KeyA ColumnA ColumnB ColumnC Byte [] Name Byte [] Name KeyB ColumnX ColumnY Column Z Byte [] Value List<Column> Columns Int64 Timestamp KeyA SuperColumnI SuperColumnJ KeyB SuperColumnM SuperColumnN
  • 5. Cassandras Data Model (in JSON)● Key > Column Family > Column  { "keyA":{ "Users":{ "emailAddress":{"timestamp":"1", "value":""}, "webSite":{"timestamp":"4", "value":""} }, "Stats":{ "visits":{"timestamp":"3", "value":"243"} } }, "keyB":{ "Users":{ "emailAddress":{"timestamp":"1", "value":""}, "twitter":{"timestamp":"4", "value":"user2"} } } }
  • 6. Cassandras Data Model (in JSON)● Key > Column Family > Super Column > Column  {   "KeyA": {     "Tags": {       "cassandra": {         "incubator": {"timestamp": ""},         "jira": {"timestamp": ""}       },       "thrift": {         "jira": {"timestamp": ""}    }   }  } }
  • 7. Differences from Dynamo● Partitioning ○ Dynamo distributes virtual nodes on the hash ring using the performance of the host node ○ Cassandra distributes host nodes by examining load information on the hash ring and moving lightly loaded nodes to alleviate those with high load● Replication ○ "Rack Unaware" ○ "Rack Aware" ○ "Datacenter Aware"
  • 8. Differences from Dynamo● Failure Detection ○ Dynamo uses a gossip-based protocol for membership changes; a node is assumed failed if it does not respond ○ Cassandra uses the same gossip-based protocol but uses a φ (phi) Accrual Failure Detector ■ Does not emit a boolean up or down ■ Emits a value which represents a suspicion level ■ The suspicion threshold is dynamically adjusted via the gossip messages ■ Sliding windows determined by arrival times  ■ Statistical distribution model created
  • 9. Differences from BigTable● Data Model ○ BigTable stores <K,V> pairs in SSTables by Column Family with historical versions ○ Cassandra drops historical versions and adds the super column concept● Storage ○ BigTable uses the Google File System (GFS) ○ Cassandra uses the local file system
  • 10. Cassandra