Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Cassandra, part 1 – principles, data model


Published on

Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.

Published in: Technology
  • Be the first to comment

Apache Cassandra, part 1 – principles, data model

  1. 1. Apache Cassandra, part 1 – principles, data model<br />
  2. 2. I. RDBMS Pros and Cons<br />
  3. 3. Pros<br />Good balance between functionality and usability. Powerful tools support.<br />SQL has feature rich syntax<br />Set of widely accepted standards.<br />Consistency<br />
  4. 4. Scalability<br />RDBMS were mainstream for tens years till requirements for scalability were increased dramatically.<br />Complexity of processed data structures was increased dramatically.<br />
  5. 5. Scaling<br />Two ways to achieve scalability:<br />Vertical scaling<br />Horizontal scaling<br />
  6. 6. CAP Theorem<br />
  7. 7. Cons<br />Cost of distributed transactions<br />No availability support . Two DB with 99.9% have availability 100% - 2 * (100% - DB availability) = 99.8% (43 min. downtime per month).<br />Additional synchronization overhead.<br />As slow as slowest DB node + network latency.<br />2PC is blocking protocol.<br />It is possible to lock resources forever.<br />
  8. 8. Cons<br />Usage of master - slave replication.<br />Makes write side (master) performance bottleneck and requires additional CPU/IO resources. <br />There is no partition tolerance. <br />
  9. 9. Sharding<br />Feature sharding<br />Hash code sharding<br />Lookup table - Node that contains lookup table is performance bottleneck and single point of failure. <br />
  10. 10. Feature sharding<br /> DB instances are divided by DB functions.<br />
  11. 11. Hash code sharding<br /> Data is divided through DB instances by hash code ranges.<br />
  12. 12. Sharding consistency<br />For efficient sharding data should be eventually consistent.<br />
  13. 13. Feature vs. hash code sharding<br />Feature sharding allows to perform consistency tuning on the domain logic granularity. But load may be not well balanced.<br />Hash code sharding allows to perform good load balancing but does not allow consistency on domain logic level.<br />
  14. 14. Cassandra sharding<br />Cassandra uses hash code load balancing<br />Cassandra better fits for reporting than for business logic processing.<br />Cassandra + Hadoop == OLAP server with high performance and availability.<br />
  15. 15. II. Apache Cassandra. Overview<br />
  16. 16. Cassandra<br />Amazon Dynamo<br />(architecture)<br />DHT<br />Eventual consistency<br />Tunable trade-offs, consistency<br />Google BigTable<br />(data model)<br /><ul><li>Values are structured and indexed
  17. 17. Column families and columns</li></ul>+<br />
  18. 18. Distributed and decentralized<br />No master/slave nodes (server symmetry)<br />No single point of failure<br />
  19. 19. DHT<br />Distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key.<br />
  20. 20. DHT<br />Keyspace<br />Keyspace partitioning<br />Overlay network<br />
  21. 21. Keyspace<br />Abstract keyspace, such as the set of 128 or 160 bit strings. <br />A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes.<br />
  22. 22. Keyspace partitioning<br />Keyspace distance function δ(k1,k2) <br />A node with ID ix owns all the keys km for which ix is the closest ID, measured according to δ(km,ix).<br />
  23. 23. Keyspace partitioning<br />Imagine mapping range from 0 to 2128 into a circle so the values wrap around. <br />
  24. 24. Keyspace partitioning<br />Consider what happens if node C is removed<br />
  25. 25. Keyspace partitioning<br />Consider what happens if node D is added<br />
  26. 26. Overlay network<br />For any key k, each node either has a node ID that owns k or has a link to a node whose node ID is closer to k<br />Greedy algorithm (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to k<br />
  27. 27. Elastic scalability<br />Adding/removing new node doesn’t require reconfiguring of Cassandra, changing application queries or restarting system<br />
  28. 28. High availability and fault tolerance<br />Cassandra picks A and P from CAP<br />Eventual consistency<br />
  29. 29. Tunable consistency<br />Replication factor (number of copies of each piece of data)<br />Consistency level (number of replicas to access on every read/write operation)<br />
  30. 30. Quorum consistency level<br />R = N/2 + 1<br /> W = N/2 + 1<br />R + W > N<br />
  31. 31. Hybrid orientation<br />Column orientation<br />columns aren’t fixed<br />columns can be sorted<br />columns can be queried for a certain range<br />Row orientation<br />each row is uniquely identifiable by key<br />rows group columns and super columns<br />
  32. 32. Schema-free<br />You don’t have to define columns when you create data model<br />You think of queries you will use and then provide data around them<br />
  33. 33. High performance<br />50 GB reading and writing<br /><ul><li> Cassandra</li></ul>- write 0.12 ms<br />- read : 15 ms<br /><ul><li>MySQL</li></ul>- write : 300 ms<br />- read : 350 ms<br />
  34. 34. III. Data Model<br />
  35. 35. Database<br />Table1<br />Table2<br />Relational data model<br />
  36. 36. Cassandra data model<br />Keyspace<br />Column Family<br />Column1<br />Column2<br />Column3<br />RowKey1<br />Value3<br />Value2<br />Value1<br />Column4<br />Column1<br />RowKey2<br />Value4<br />Value1<br />
  37. 37. Keyspace<br />Keyspace is close to a relational database<br />Basic attributes:<br />replication factor<br />replica placement strategy<br />column families (tables from relational model)<br />Possible to create several keyspaces per application (for example, if you need different replica placement strategy or replication factor)<br />
  38. 38. Column family<br />Container for collection of rows<br />Column family is close to a table from relational data model<br />Column Family<br />Row<br />Column1<br />Column2<br />Column3<br />RowKey<br />Value3<br />Value2<br />Value1<br />
  39. 39. Column family vs. Table<br />Store represents four-dimensional hash map[Keyspace][ColumnFamily][Key][Column]<br />The columns are not strictly defined in column family and you can freely add any column to any row at any time<br />A column family can hold columns or super columns (collection of subcolumns)<br />
  40. 40. Column family vs. Table<br />Column family has an comparator attribute which indicated how columns will be sorted in query results (according to long, byte, UTF8, etc)<br />Each column family is stored in separate file on disk so it’s useful to keep related columns in the same column family<br />
  41. 41. Column<br />Basic unit of data structure<br />Column<br />name: byte[]<br />value: byte[]<br />clock: long<br />
  42. 42. Skinny and wide rows<br />Wide rows – huge number of columns and several rows (are used to store lists of things)<br />Skinny rows – small number of columns and many different rows (close to the relational model)<br />
  43. 43. Disadvantages of wide rows<br />Badly work with RowCash<br />If you have many rows and many columns you end up with larger indexes<br /> (~ 40GB of data and 10GB index)<br />
  44. 44. Column sorting<br />Column sorting is typically important only with wide model<br />Comparator – is an attribute of column family that specifies how column names will be compared for sort order<br />
  45. 45. Comparator types<br />Cassandra has following predefined types:<br />AsciiType<br />BytesType<br />LexicalUUIDType<br />IntegerType<br />LongType<br />TimeUUIDType<br />UTF8Type<br />
  46. 46. Super column<br />Stores map of subcolumns<br />Super column<br />name: byte[]<br />cols: Map<byte[], Column><br /><ul><li>Cannot store map of super columns (only one level deep)
  47. 47. Five-dimensional hash:</li></ul>[Keyspace][ColumnFamily][Key][SuperColumn][SubColumn] <br />
  48. 48. Super column<br /><ul><li>Sometimes it is useful to use composite keys instead of super columns.
  49. 49. Necessity more then one level depth
  50. 50. Performance issues</li></li></ul><li>Super column family<br />Column families:<br />Standard (default)<br />Can combine columns and super columns<br />Super<br />More strict schema constraints<br />Can store only super columns<br />Subcomparator can be specified for subcolumns<br />
  51. 51. Note that<br />There are no joins in Cassandra, so you can<br />join data on a client side<br />create denormalized second column family<br />
  52. 52. IV. Advanced column types<br />
  53. 53. TTL column type<br />TTL column is column value of which expires after given period of time.<br />Useful to store session token.<br />
  54. 54. Counter column<br />In eventual consistent environment old versions of column values are overridden by new one, but counters should be cumulative.<br />Counter columns are intended to support increment/decrement operations in eventual consistent environment without losing any of them.<br />
  55. 55. CounterColumn internals<br />CounterColumn structure:<br />name<br />…….<br />[<br /> (replicaId1, counter1, logical clock1),<br /> (replicaId2, counter2, logical clock2),<br /> ………………..<br /> (replicaId3, counter3, logical clock3)<br />]<br />
  56. 56. CounterColumn write - before<br />UPDATE CounterCF SET count_me = count_me + 2 <br /> WHERE key = 'counter1‘<br />[<br /> (A, 10, 2),<br /> (B, 3, 4),<br /> (C, 6, 7)<br />]<br />
  57. 57. CounterColumn write -after<br />A is leader<br /> [<br /> (A, 10 + 2, 2 + 1),<br /> (B, 3, 4),<br /> (C, 6, 7)<br /> ]<br />
  58. 58. CounterColumn Read<br />All Memtables and SSTables are read through using following algorithm:<br />All tuples with local replicaId will be summarized, tuple with maximum logical clock value will be chosen for foreign replica. <br />Counters of foreign replicas are updated during read repair , during replicate on write procedure or by AES<br />
  59. 59. CounterColumn read - example<br />Memtable - (A, 12, 4) (B, 3, 5) (C, 10, 3)<br />SSTable1 – (A, 5, 3) (B, 1, 6) (C, 5, 4)<br />SSTable2 – (A, 2, 2) (B, 2, 4) (C, 6, 2)<br />Result: <br /> (A, 19, 9) + (B, 1,6) + (C, 5, 4) =19 + 1 + 5 = 25 <br />
  60. 60. Resources<br />Home of Apache Cassandra Project<br />Apache Cassandra Wiki<br />Documentation provided by DataStax<br />Good explanation of creation secondary indexes<br />Eben Hewitt “Cassandra: The Definitive Guide”, O’REILLY, 2010, ISBN: 978-1-449-39041-9<br />
  61. 61. Authors<br />Lev Sivashov-<br />Andrey Lomakin -, twitter: @Andrey_LomakinLinkedIn:<br />Artem Orobets – enisher@gmail.comtwitter: @Dr_EniSh<br />Anton Veretennik -<br />