Introduction to  Cassandra    Shimi Kiviti    @shimi_k
Motivation            ScalingHow do you scale your database? ● reads ● writes
Influential Papers ● Bigtable: A distributed storage system for structured data,   2006 ● Dynamo: amazons highly available...
Cassandra Highlights● Symmetric - all nodes are exactly the same   ○ No single point of failure   ○ Linearly scalable   ○ ...
DHT - Distributed Hash Table
DHT● O(1) node lookup● Explicit replication● Linear Scalability
ConsistencyN = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write ...
Consistency Level● Every request defines consistency level   ○ Any   ○ One   ○ Two   ○ Three   ○ Quorum   ○ Local Quorum  ...
Data Model● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns
Column FamilyKey1   Column   Column   ColumnKey2   Column   Column
Column FamilyColumnFamily: {  TOK: {    chen: 1,    ronen: 7  }  CityPath: {    yuval: 5  }}
Super Column Family          Super1   Column Column ColumnKey          Super2   Column Column Column ColumnFamily: {   Key...
Write● Any node● Partitioner● Commit log, memtable● Wait for W responses
Write
Write● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)
Read● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background
ReadRead can be from multiple SSTablesSlower then writes
Cache● There is no need to use memcached● There is an internal configurable cache   ○ Key cache   ○ Row cache
SortingWhen you preform get the result is sorted ● Rows are sorted according to the partitioner ● Columns in a row are sor...
Partitioner● RandomPartitioner - Uses hash values as tokens. useful for  distributing the load on all nodes.  If you use i...
Column TypesAvailable types: ● Bytes ● UTF8 ● Ascii ● Long ● Date ● UUID ● Composite - <Type1>:<Type2>
Column TypesExamples:Sort1:8            109      vs    810           9Sort2:dan:8             dan:10dan:10      vs    dan:...
Clients● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients:   ○ Pyt...
Cascal - Scala clientInsert column:session.insert("app"  "users"  "shimi"  "passwd"  "mypass")val key = "app"  "users"  "s...
CascalGet multiple columns:val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val c...
CascalGet multiple rows:val family = "app"  "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows...
CascalRemove column:session.remove("app"  "users"  "shimi"  "passwd")Remove row:session.remove("app"  "users"  "shimi")Bat...
Guidelines● Keep together the data you query together● Think about your use case and how you should fetch your  data.● Don...
The EndUseful links: ● Cassandra, http://cassandra.apache.org/ ● Wiki http://wiki.apache.org/cassandra/ ● Cassandra mailin...
Introduction to Cassandra
Introduction to Cassandra
Upcoming SlideShare
Loading in …5
×

Introduction to Cassandra

1,578 views

Published on

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,578
On SlideShare
0
From Embeds
0
Number of Embeds
37
Actions
Shares
0
Downloads
47
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Introduction to Cassandra

  1. 1. Introduction to Cassandra Shimi Kiviti @shimi_k
  2. 2. Motivation ScalingHow do you scale your database? ● reads ● writes
  3. 3. Influential Papers ● Bigtable: A distributed storage system for structured data, 2006 ● Dynamo: amazons highly available key-value store, 2007Cassandra: ● partition and replication - Dynamo ● log structure column family - Bigtable
  4. 4. Cassandra Highlights● Symmetric - all nodes are exactly the same ○ No single point of failure ○ Linearly scalable ○ Ease of administration● High availability with multiple datacenters● Consistency vs Latency● Read/Write anywhere● Flexible Schema● Column TTL● Distributed Counters
  5. 5. DHT - Distributed Hash Table
  6. 6. DHT● O(1) node lookup● Explicit replication● Linear Scalability
  7. 7. ConsistencyN = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write <= NQuorum = N/2 + 1When W + R > N there is a full consistencyexamples: ● W = 1, R = N ● W = N, R = 1 ● W = Quorum, R = Quorum
  8. 8. Consistency Level● Every request defines consistency level ○ Any ○ One ○ Two ○ Three ○ Quorum ○ Local Quorum ○ Each Quorum ○ All
  9. 9. Data Model● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns
  10. 10. Column FamilyKey1 Column Column ColumnKey2 Column Column
  11. 11. Column FamilyColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 }}
  12. 12. Super Column Family Super1 Column Column ColumnKey Super2 Column Column Column ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } } }
  13. 13. Write● Any node● Partitioner● Commit log, memtable● Wait for W responses
  14. 14. Write
  15. 15. Write● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)
  16. 16. Read● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background
  17. 17. ReadRead can be from multiple SSTablesSlower then writes
  18. 18. Cache● There is no need to use memcached● There is an internal configurable cache ○ Key cache ○ Row cache
  19. 19. SortingWhen you preform get the result is sorted ● Rows are sorted according to the partitioner ● Columns in a row are sorted according to the type of the column name
  20. 20. Partitioner● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes. If you use it, set the nodes tokens manually● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster
  21. 21. Column TypesAvailable types: ● Bytes ● UTF8 ● Ascii ● Long ● Date ● UUID ● Composite - <Type1>:<Type2>
  22. 22. Column TypesExamples:Sort1:8 109 vs 810 9Sort2:dan:8 dan:10dan:10 vs dan:8shimi:1 shimi:1
  23. 23. Clients● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients: ○ Python ○ Java ○ Scala ○ Clojure ○ .Net ○ Ruby ○ PHP ○ Perl ○ C++ ○ Haskel
  24. 24. Cascal - Scala clientInsert column:session.insert("app" "users" "shimi" "passwd" "mypass")val key = "app" "users" "shimi"session.insert(key "email" "shimi.k@...")Get column value:val pass = session.get(key "passwd")
  25. 25. CascalGet multiple columns:val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))
  26. 26. CascalGet multiple rows:val family = "app" "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows = session.list(family, KeyPrdicate("dan", "shimi"))
  27. 27. CascalRemove column:session.remove("app" "users" "shimi" "passwd")Remove row:session.remove("app" "users" "shimi")Batch operations:val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))val insertEmail = Insert(key "email" "shimi.k@...")session.batch(insertEmail :: deleteCols)
  28. 28. Guidelines● Keep together the data you query together● Think about your use case and how you should fetch your data.● Dont try to normalize your data● You cant win the disk● Be ready to get your hands dirty● There is no single solution for everything. You might consider using different solutions together
  29. 29. The EndUseful links: ● Cassandra, http://cassandra.apache.org/ ● Wiki http://wiki.apache.org/cassandra/ ● Cassandra mailing list ● IRC ● Bigtable, http://labs.google.com/papers/bigtable.html ● Dynamo http://www.allthingsdistributed. com/2007/10/amazons_dynamo.html ● Cascal, https://github.com/shimi/cascal

×