Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

NoSQL Essentials: Cassandra

2,409 views

Published on

Published in: Technology

NoSQL Essentials: Cassandra

  1. 1. NoSQL EssentialsCassandra & Dynamo-like Databases Buenos Aires, Argentina, Nov 2012 Fernando Rodriguez Olivera @frodriguez nosqlessentials.com
  2. 2. Hash Partitioning A 0Client B 1 C 2 D 3 N  =  4
  3. 3. Hash Partitioning A 0 Client B 1hash(“hello”)  mod  4  =  2 C 2 D 3 N  =  4
  4. 4. Hash Partitioning A 0 Client hello B 1hash(“hello”)  mod  4  =  2 C 2 D 3 N  =  4
  5. 5. Hash Partitioning A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2 D 3 N  =  4
  6. 6. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2 D 3 N  =  4
  7. 7. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3 N  =  4
  8. 8. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2 byehash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3 N  =  4
  9. 9. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2 byehash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3Difficult to add/remove nodes N  =  4
  10. 10. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 49152 16384 Client 32768
  11. 11. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 token  A  =  33015   49152 16384 Client A 32768
  12. 12. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client A 32768
  13. 13. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 A C 32768
  14. 14. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 token  D  =  40927 D A C 32768
  15. 15. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A C 32768
  16. 16. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A C 32768
  17. 17. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551 32768
  18. 18. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551 32768
  19. 19. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551hash(“bye”)      =  60912 32768
  20. 20. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B bye token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551hash(“bye”)      =  60912 32768
  21. 21. Consistent Hashing / Virtual Nodes 0 A C D 65535 A C4 Virtual Nodes BRandom Tokens B token  A.1  =  ... token  A.2  =  ... D token  A.3  =  ... A token  A.4  =  ... A token  B.1  =  ... B D C D B A C
  22. 22. Consistent Hashing / Manual Placement 0 1Uniform Distribution 8 2 Calculated Tokens 7 Adding/Removing 3 Node Requires Rebalancing 6 4 5
  23. 23. Token Generation  #>  cassandra/tools/bin/token-­‐generator  Token  Generator  Interactive  Mode  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    How  many  datacenters  will  participate  in  this  Cassandra  cluster?  1    How  many  nodes  are  in  datacenter  #1?  5  DC  #1:      Node  #1:                                                                                0      Node  #2:      34028236692093846346337460743176821145      Node  #3:      68056473384187692692674921486353642290      Node  #4:    102084710076281539039012382229530463435      Node  #5:    136112946768375385385349842972707284580
  24. 24. Partitioning Strategy RandomPartitioner (consistent hashing) ByteOrderedPartitioner Cassandra Documentation from DataStax: “Unless  absolutely  required  by  your  application,  DataStax  strongly  recommends   against  using  the  ordered  partitioner”
  25. 25. Partitioning and Replication 0 KeySpace 1 with RF=3 8 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  26. 26. Partitioning and Replication 0 KeySpace 1 with RF=3 8 A 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  27. 27. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 8 A 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  28. 28. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 8 A 2 R1 Client 7 3 R2Partition Strategy 6Replication Strategy 4 R3 5
  29. 29. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 A 8 A 2 R1 A Client 7 A 3 R2Partition Strategy 6Replication Strategy 4 R3 5
  30. 30. Cassandra CLI / Keyspaces  CREATE  KEYSPACE  demo            WITH  placement_strategy  =  SimpleStrategy            AND  strategy_options:replication_factor  =  3;  CREATE  KEYSPACE  cache            WITH  placement_strategy  =  SimpleStrategy            AND  strategy_options:replication_factor  =  1          AND  durable_writes  =  ‘false’;
  31. 31. Cassandra CLI / Static Columns  CREATE  COLUMN  FAMILY  users            WITH  comparator                    =  UTF8Type          AND  key_validation_class  =  UTF8Type          AND  column_metadata  =  [              {  column_name:  name,          validation_class:  UTF8Type  },              {  column_name:  password,  validation_class:  UTF8Type  },              {  column_name:  country,    validation_class:  UTF8Type  },              {  column_name:  state,        validation_class:  UTF8Type  }          ](static column familiy)  SET  users[alankay][name]        =  Alan  Kay;  SET  users[alankay][state]      =  CA;    SET  users[alankay][country]  =  US;
  32. 32. Cassandra CLI / Dynamic Columns  CREATE  COLUMN  FAMILY  posts            WITH  comparator                    =  TimeUUIDType          AND  key_validation_class  =  UTF8Type          AND  default_validation_class  =  UTF8Type;(dynamic column familiy)  SET  posts[‘alankay’][timeuuid()]  =  ‘Hello  world...’;
  33. 33. Cassandra CLI / Counters  CREATE  COLUMN  FAMILY  page_views            WITH  comparator                    =  UTF8Type          AND  key_validation_class  =  UTF8Type          AND  default_validation_class  =  CounterType;(counter column familiy)  INCR  page_views[‘www.google.com’][‘about.html’]  BY  1  INCR  page_views[‘www.google.com’][‘help.html’]    BY  1
  34. 34. CQL(Cassandra Query Language)SQL-like language. No joins, aggregation, ...
  35. 35. CQL (Cassandra Query Language)  CREATE  KEYSPACE  demo            WITH  strategy_class  =  SimpleStrategy            AND  strategy_options:replication_factor  =  3;  CREATE  TABLE  users  (        login          varchar  PRIMARY  KEY,        name            varchar,        password    varchar,        country      varchar,        state          varchar)                CREATE  INDEX  users_country  ON  users(country)  CREATE  INDEX  users_state      ON  users(state)
  36. 36. CQL (Cassandra Query Language)  INSERT  INTO  users  (login,  name,  country,  state)    VALUES  (‘alankay’,  ‘Alan  Kay’,  ‘US’,  ‘CA’)    SELECT  *    FROM  users    WHERE  login  =  ‘alankey’    SELECT  *    FROM  users    WHERE  country  =  ‘US’  and  state  =  ‘CA’
  37. 37. CQL Counters  CREATE  TABLE  login_stats  (      login  varchar,        success  counter,        failed  counter,        PRIMARY  KEY(login)  );  UPDATE  login_stats    SET  success  =  success  +  1    WHERE  login  =  alankay;
  38. 38. CQL (Cassandra Query Language) Type CQLBytesType blobAsciiType asciiUTF8Type text,  varcharIntegerType varint arbitrary-­‐precisionInt32Type int 4-­‐bytes  integerLongType bigint 8-­‐bytes  integerUUIDType uuidTimeUUIDType timeuuidDateType timestamp 8-­‐bytesBooleanType booleanFloatType floatDoubleType double 8-­‐bytesDecimalType decimal variable-­‐precisionCounterColumnType counter distributed  counter
  39. 39. Tunable Consistency Any (Only for Write) One, Two, Three Quorum Local Quorum ALL Each Quorum  SELECT  *  FROM  users  USING  CONSISTENCY  QUORUM  WHERE  ...    INSERT  INTO  users  (id,  name,  ..)  VALUES  (...)    USING  CONSISTENCY  QUORUM
  40. 40. Consistency Level 0 1 8 2 Client 7 USING 3CONSISTENCY ONE 6 4 5
  41. 41. Consistency Level 0 1 Coordinator 8 A 2 Client 7 USING 3CONSISTENCY ONE 6 4 5
  42. 42. Consistency Level 0 1 Coordinator 8 A 2 R1 Client 7 USING 3 R2CONSISTENCY ONE 6 4 R3 5
  43. 43. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client 7 A USING 3 R2CONSISTENCY ONE 6 4 R3 5
  44. 44. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client 7 A USING Ack 3 R2CONSISTENCY ONE 6 4 R3 5
  45. 45. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client Ack 7 A USING Ack 3 R2CONSISTENCY ONE 6 4 R3 5
  46. 46. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  47. 47. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  48. 48. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  49. 49. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  50. 50. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  51. 51. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  52. 52. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  53. 53. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  54. 54. Hinted Handoff Writes 0 1 8 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  55. 55. Hinted Handoff Writes 0 1 8 A 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  56. 56. Hinted Handoff Writes 0 1 Coordinator 8 A 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  57. 57. Hinted Handoff Writes 0 1 Coordinator 8 A 2 R1 Client 7 Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  58. 58. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client 7 A Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  59. 59. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client Hint 3:A 7 A Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  60. 60. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client Hint 3:A 7 A Down 3 R2 Hints stored for down replicas Hint 6 3:B 4 R3If consistency level = ANY, 5 always writable
  61. 61. Anti-Entropy / Read Repair 0 KeySpace 1 with RF=3 8 2 Client 7 3read_repair_chance 6 by column family 4 5
  62. 62. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 8 2 Client 7 3read_repair_chance 6 by column family 4 5
  63. 63. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 8 2 R1 Client 7 3 R2read_repair_chance 6 by column family 4 R3 5
  64. 64. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  65. 65. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  66. 66. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  67. 67. Anti-Entropy / Node Repair 0 1 8 2 node  repair t/ e eques epons 7 disk  expensive/ 3 TreeR TreeRnetwork  efficient 6 4 5
  68. 68. Anti-Entropy / Node Repair 0 1 8 2 node  repair t/ e eques epons 7 disk  expensive/ 3 TreeR TreeRnetwork  efficient 6 4 5
  69. 69. Merkle Trees Top  Hash Top  HashHash  1-­‐2 Hash  3-­‐4 Hash  1-­‐2 Hash  3-­‐4 Hash  1 Hash  2 Hash  3 Hash  4 Hash  1 Hash  2 Hash  3 Hash  4
  70. 70. Multi Datacenter Partitioning 0 0 1 6 34 2 5 4 3 DataCenter  1 DataCenter  2
  71. 71. Multi Datacenter Partitioning 0 1 6 3 4 2 5 4 3DataCenter  1 DataCenter  2
  72. 72. Multi Datacenter PartitioningClient 0 0 1 6 3 4 2 5 4 3 DataCenter  1 DataCenter  2
  73. 73. Multi Datacenter PartitioningClient 0 0 1 6 3 A 4 2 5 4 3 DataCenter  1 DataCenter  2
  74. 74. Multi Datacenter PartitioningClient 0 0 1 6 3 A 4 2 5 4 3 DataCenter  1 DataCenter  2
  75. 75. Multi Datacenter PartitioningClient 0 0 1 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  76. 76. Multi Datacenter PartitioningClient 0 0 1 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  77. 77. Multi Datacenter PartitioningClient 0 0 1 A 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  78. 78. Multi Datacenter PartitioningClient 0 0 1 A A 6 3 A A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  79. 79. Replica Placement SimpleStrategy (adjacent nodes)  CREATE  KEYSPACE  demo        WITH  strategy_class  =  ‘SimpleStrategy’        AND    strategy_options:replication_factor  =  3;
  80. 80. Replica Placement NetworkTopologyStrategy (replication by datacenter)  CREATE  KEYSPACE  demo        WITH  strategy_class  =  ‘NetworkTopologyStrategy’        AND    strategy_options:DC1  =  3        AND    strategy_options:DC2  =  2;
  81. 81. Topology Discovery SimpleSnitch (single datacenter) EC2Snitch (region as datancer, a. zone as rack) PropertyFileSnitch (cassandra-topology.properties) RackInferringSnitch (10.DataCenter.Rack.Node)
  82. 82. Property File Snitch      cassandra-­‐topology.properties  66.160.141.216  =  DC1:RAC1  66.160.141.217  =  DC1:RAC1  66.160.141.218  =  DC1:RAC1  174.129.20.82  =  DC2:RAC1  174.129.20.83  =  DC2:RAC1  174.129.30.60  =  DC2:RAC2  174.129.30.61  =  DC2:RAC2
  83. 83. Wide Rows (Composite Primary Key)  CREATE  TABLE  page_views  (          domain  varchar,            page  varchar,            hits  counter,            PRIMARY  KEY(domain,  page)  );  UPDATE  page_views    SET  hits  =  hits  +  1    WHERE  domain  =  www.google.com  and  page  =  /faq.html;
  84. 84. Wide Rows (Composite Primary Key)  CREATE  TABLE  metrics  (        name  text,          day  int,          value  counter,          PRIMARY  KEY  (name,  day)  );  UPDATE  metrics    SET  value  =  value  +  1    WHERE  name  =  google.com  AND  day  =  20121201;  SELECT  *    FROM  metrics    WHERE  day  >  20121201  AND  day  <  20121205                AND  name  =  ‘google.com’
  85. 85. Wide Rows (Composite Primary Key)  CREATE  TABLE  tweets  (        tweet_id  uuid  PRIMARY  KEY,        author  varchar,        body  varchar  );  CREATE  TABLE  timeline  (        user_id  varchar,        tweet_id  uuid,      //  uuid  with  time  as  prefix  timeuuid        author  varchar,        body  varchar,        PRIMARY  KEY  (user_id,  tweet_id)  );
  86. 86. Atomic Batches (1.2+)  BEGIN  BATCH  USING  CONSISTENCY  QUORUM          INSERT  INTO  tweets  (user_id,  tweet_id,  author,  body)            VALUES  (‘alankay’,  ...,  ‘alan  kay’,  ‘...’)            INSERT  INTO  timeline  (user_id,  tweet_id,  author,  body)            VALUES  (‘other’,  ‘...’,  ‘alankay’,  ‘...’)  APPLY  BATCH  CREATE  TABLE  batchlog  (          id  uuid  PRIMARY  KEY,          written_at  timestamp,          data  blob  )
  87. 87. Collections / Sets (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        emails  set<text>  );  INSERT  INTO  users  (login,  name,  emails)    VALUES  (‘alankay’,  ‘Alan  Kay’,  {  “alan@kay.com”  })    UPDATE  users    SET  emails  +  {  “a@b.com”  }    WHERE  login  =  ‘alankay’  
  88. 88. Collections / Maps (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        social_ids  map<text,  text>  );  INSERT  INTO  users  (login,  name,  social_ids)    VALUES  (‘alankay’,  ‘Alan  Kay’,  {  “twitter”  :  “alankay”  })    UPDATE  users    SET  social_ids[“google”]  =  “+alankay”    WHERE  login  =  ‘alankay’  
  89. 89. Collections / Lists (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        creditcards  list<text>  );  INSERT  INTO  users  (login,  name,  creditcards)    VALUES  (‘alankay’,  ‘Alan  Kay’,  [  “1234-­‐”  ])    UPDATE  users    SET  creditcards  +  “2345-­‐”    WHERE  login  =  ‘alankay’  
  90. 90. Cassandra ClientsShells High Level APIsCassandra-­‐CLI Java:  Hector  Client  APICQLSH Java:  Astyanax  (Netflix)Drivers Scala:  Cassie  (Twitter)Java:  CQL  /  JDBC Python:  PyCassa  Client  API PHP:  PhpCassa  Client  APIMappingsJava:  Apache  Gora Low LevelJava:  Kundera  (JPA) Thrift  (multi  language)
  91. 91. Thanks, Fernando Rodriguez Olivera twitter:  @frodriguez      mail:  frodriguez  <at>  gmail.com website:  nosqlessentials.com Next course (Spanish only):Hadoop/HBase/Cassandra/MongoDB Buenos Aires, 18/19 Dec 2012: Registration: nosqlessentials.com

×