NoSQL Essentials: Cassandra

1,993 views
1,776 views

Published on

Published in: Technology

NoSQL Essentials: Cassandra

  1. 1. NoSQL EssentialsCassandra & Dynamo-like Databases Buenos Aires, Argentina, Nov 2012 Fernando Rodriguez Olivera @frodriguez nosqlessentials.com
  2. 2. Hash Partitioning A 0Client B 1 C 2 D 3 N  =  4
  3. 3. Hash Partitioning A 0 Client B 1hash(“hello”)  mod  4  =  2 C 2 D 3 N  =  4
  4. 4. Hash Partitioning A 0 Client hello B 1hash(“hello”)  mod  4  =  2 C 2 D 3 N  =  4
  5. 5. Hash Partitioning A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2 D 3 N  =  4
  6. 6. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2 D 3 N  =  4
  7. 7. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2hash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3 N  =  4
  8. 8. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2 byehash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3 N  =  4
  9. 9. Hash Partitioning world A 0 Client hello B 1hash(“hello”)  mod  4  =  2 byehash(“world”)  mod  4  =  0 C 2hash(“bye”)      mod  4  =  3 D 3Difficult to add/remove nodes N  =  4
  10. 10. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 49152 16384 Client 32768
  11. 11. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 token  A  =  33015   49152 16384 Client A 32768
  12. 12. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client A 32768
  13. 13. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 A C 32768
  14. 14. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 token  D  =  40927 D A C 32768
  15. 15. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A C 32768
  16. 16. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A C 32768
  17. 17. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551 32768
  18. 18. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551 32768
  19. 19. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551hash(“bye”)      =  60912 32768
  20. 20. Consistent Hashing / Random Tokens 0E.g:  Address  Space  0..Max  =  0..65535 hash  function  with  range  0..Max 65535 B bye token  A  =  33015   token  B  =    8915 49152 16384 Client hello token  C  =  31541 token  D  =  40927 world Dhash(“hello”)  =  13209 A Chash(“world”)  =  36551hash(“bye”)      =  60912 32768
  21. 21. Consistent Hashing / Virtual Nodes 0 A C D 65535 A C4 Virtual Nodes BRandom Tokens B token  A.1  =  ... token  A.2  =  ... D token  A.3  =  ... A token  A.4  =  ... A token  B.1  =  ... B D C D B A C
  22. 22. Consistent Hashing / Manual Placement 0 1Uniform Distribution 8 2 Calculated Tokens 7 Adding/Removing 3 Node Requires Rebalancing 6 4 5
  23. 23. Token Generation  #>  cassandra/tools/bin/token-­‐generator  Token  Generator  Interactive  Mode  -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐    How  many  datacenters  will  participate  in  this  Cassandra  cluster?  1    How  many  nodes  are  in  datacenter  #1?  5  DC  #1:      Node  #1:                                                                                0      Node  #2:      34028236692093846346337460743176821145      Node  #3:      68056473384187692692674921486353642290      Node  #4:    102084710076281539039012382229530463435      Node  #5:    136112946768375385385349842972707284580
  24. 24. Partitioning Strategy RandomPartitioner (consistent hashing) ByteOrderedPartitioner Cassandra Documentation from DataStax: “Unless  absolutely  required  by  your  application,  DataStax  strongly  recommends   against  using  the  ordered  partitioner”
  25. 25. Partitioning and Replication 0 KeySpace 1 with RF=3 8 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  26. 26. Partitioning and Replication 0 KeySpace 1 with RF=3 8 A 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  27. 27. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 8 A 2 Client 7 3Partition Strategy 6Replication Strategy 4 5
  28. 28. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 8 A 2 R1 Client 7 3 R2Partition Strategy 6Replication Strategy 4 R3 5
  29. 29. Partitioning and Replication 0 KeySpace 1 Coordinator with RF=3 A 8 A 2 R1 A Client 7 A 3 R2Partition Strategy 6Replication Strategy 4 R3 5
  30. 30. Cassandra CLI / Keyspaces  CREATE  KEYSPACE  demo            WITH  placement_strategy  =  SimpleStrategy            AND  strategy_options:replication_factor  =  3;  CREATE  KEYSPACE  cache            WITH  placement_strategy  =  SimpleStrategy            AND  strategy_options:replication_factor  =  1          AND  durable_writes  =  ‘false’;
  31. 31. Cassandra CLI / Static Columns  CREATE  COLUMN  FAMILY  users            WITH  comparator                    =  UTF8Type          AND  key_validation_class  =  UTF8Type          AND  column_metadata  =  [              {  column_name:  name,          validation_class:  UTF8Type  },              {  column_name:  password,  validation_class:  UTF8Type  },              {  column_name:  country,    validation_class:  UTF8Type  },              {  column_name:  state,        validation_class:  UTF8Type  }          ](static column familiy)  SET  users[alankay][name]        =  Alan  Kay;  SET  users[alankay][state]      =  CA;    SET  users[alankay][country]  =  US;
  32. 32. Cassandra CLI / Dynamic Columns  CREATE  COLUMN  FAMILY  posts            WITH  comparator                    =  TimeUUIDType          AND  key_validation_class  =  UTF8Type          AND  default_validation_class  =  UTF8Type;(dynamic column familiy)  SET  posts[‘alankay’][timeuuid()]  =  ‘Hello  world...’;
  33. 33. Cassandra CLI / Counters  CREATE  COLUMN  FAMILY  page_views            WITH  comparator                    =  UTF8Type          AND  key_validation_class  =  UTF8Type          AND  default_validation_class  =  CounterType;(counter column familiy)  INCR  page_views[‘www.google.com’][‘about.html’]  BY  1  INCR  page_views[‘www.google.com’][‘help.html’]    BY  1
  34. 34. CQL(Cassandra Query Language)SQL-like language. No joins, aggregation, ...
  35. 35. CQL (Cassandra Query Language)  CREATE  KEYSPACE  demo            WITH  strategy_class  =  SimpleStrategy            AND  strategy_options:replication_factor  =  3;  CREATE  TABLE  users  (        login          varchar  PRIMARY  KEY,        name            varchar,        password    varchar,        country      varchar,        state          varchar)                CREATE  INDEX  users_country  ON  users(country)  CREATE  INDEX  users_state      ON  users(state)
  36. 36. CQL (Cassandra Query Language)  INSERT  INTO  users  (login,  name,  country,  state)    VALUES  (‘alankay’,  ‘Alan  Kay’,  ‘US’,  ‘CA’)    SELECT  *    FROM  users    WHERE  login  =  ‘alankey’    SELECT  *    FROM  users    WHERE  country  =  ‘US’  and  state  =  ‘CA’
  37. 37. CQL Counters  CREATE  TABLE  login_stats  (      login  varchar,        success  counter,        failed  counter,        PRIMARY  KEY(login)  );  UPDATE  login_stats    SET  success  =  success  +  1    WHERE  login  =  alankay;
  38. 38. CQL (Cassandra Query Language) Type CQLBytesType blobAsciiType asciiUTF8Type text,  varcharIntegerType varint arbitrary-­‐precisionInt32Type int 4-­‐bytes  integerLongType bigint 8-­‐bytes  integerUUIDType uuidTimeUUIDType timeuuidDateType timestamp 8-­‐bytesBooleanType booleanFloatType floatDoubleType double 8-­‐bytesDecimalType decimal variable-­‐precisionCounterColumnType counter distributed  counter
  39. 39. Tunable Consistency Any (Only for Write) One, Two, Three Quorum Local Quorum ALL Each Quorum  SELECT  *  FROM  users  USING  CONSISTENCY  QUORUM  WHERE  ...    INSERT  INTO  users  (id,  name,  ..)  VALUES  (...)    USING  CONSISTENCY  QUORUM
  40. 40. Consistency Level 0 1 8 2 Client 7 USING 3CONSISTENCY ONE 6 4 5
  41. 41. Consistency Level 0 1 Coordinator 8 A 2 Client 7 USING 3CONSISTENCY ONE 6 4 5
  42. 42. Consistency Level 0 1 Coordinator 8 A 2 R1 Client 7 USING 3 R2CONSISTENCY ONE 6 4 R3 5
  43. 43. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client 7 A USING 3 R2CONSISTENCY ONE 6 4 R3 5
  44. 44. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client 7 A USING Ack 3 R2CONSISTENCY ONE 6 4 R3 5
  45. 45. Consistency Level 0 1 Coordinator A 8 A 2 R1 A Client Ack 7 A USING Ack 3 R2CONSISTENCY ONE 6 4 R3 5
  46. 46. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  47. 47. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  48. 48. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  49. 49. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  50. 50. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  51. 51. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  52. 52. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  53. 53. Gossip-based Protocol 0 1 8 2 7 3 6 4 5
  54. 54. Hinted Handoff Writes 0 1 8 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  55. 55. Hinted Handoff Writes 0 1 8 A 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  56. 56. Hinted Handoff Writes 0 1 Coordinator 8 A 2 Client 7 Down 3 Hints stored for down replicas 6 4If consistency level = ANY, 5 always writable
  57. 57. Hinted Handoff Writes 0 1 Coordinator 8 A 2 R1 Client 7 Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  58. 58. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client 7 A Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  59. 59. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client Hint 3:A 7 A Down 3 R2 Hints stored for down replicas 6 4 R3If consistency level = ANY, 5 always writable
  60. 60. Hinted Handoff Writes 0 1 Coordinator A 8 A 2 R1 Client Hint 3:A 7 A Down 3 R2 Hints stored for down replicas Hint 6 3:B 4 R3If consistency level = ANY, 5 always writable
  61. 61. Anti-Entropy / Read Repair 0 KeySpace 1 with RF=3 8 2 Client 7 3read_repair_chance 6 by column family 4 5
  62. 62. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 8 2 Client 7 3read_repair_chance 6 by column family 4 5
  63. 63. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 8 2 R1 Client 7 3 R2read_repair_chance 6 by column family 4 R3 5
  64. 64. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  65. 65. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  66. 66. Anti-Entropy / Read Repair 0 KeySpace 1 Coordinator with RF=3 DigestQuery 8 2 R1 Client Qu ery 7 Di ge stQ 3 R2 ue ryread_repair_chance 6 by column family 4 R3 5
  67. 67. Anti-Entropy / Node Repair 0 1 8 2 node  repair t/ e eques epons 7 disk  expensive/ 3 TreeR TreeRnetwork  efficient 6 4 5
  68. 68. Anti-Entropy / Node Repair 0 1 8 2 node  repair t/ e eques epons 7 disk  expensive/ 3 TreeR TreeRnetwork  efficient 6 4 5
  69. 69. Merkle Trees Top  Hash Top  HashHash  1-­‐2 Hash  3-­‐4 Hash  1-­‐2 Hash  3-­‐4 Hash  1 Hash  2 Hash  3 Hash  4 Hash  1 Hash  2 Hash  3 Hash  4
  70. 70. Multi Datacenter Partitioning 0 0 1 6 34 2 5 4 3 DataCenter  1 DataCenter  2
  71. 71. Multi Datacenter Partitioning 0 1 6 3 4 2 5 4 3DataCenter  1 DataCenter  2
  72. 72. Multi Datacenter PartitioningClient 0 0 1 6 3 4 2 5 4 3 DataCenter  1 DataCenter  2
  73. 73. Multi Datacenter PartitioningClient 0 0 1 6 3 A 4 2 5 4 3 DataCenter  1 DataCenter  2
  74. 74. Multi Datacenter PartitioningClient 0 0 1 6 3 A 4 2 5 4 3 DataCenter  1 DataCenter  2
  75. 75. Multi Datacenter PartitioningClient 0 0 1 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  76. 76. Multi Datacenter PartitioningClient 0 0 1 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  77. 77. Multi Datacenter PartitioningClient 0 0 1 A 6 3 A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  78. 78. Multi Datacenter PartitioningClient 0 0 1 A A 6 3 A A A 4 2 A 5 4 3 DataCenter  1 DataCenter  2
  79. 79. Replica Placement SimpleStrategy (adjacent nodes)  CREATE  KEYSPACE  demo        WITH  strategy_class  =  ‘SimpleStrategy’        AND    strategy_options:replication_factor  =  3;
  80. 80. Replica Placement NetworkTopologyStrategy (replication by datacenter)  CREATE  KEYSPACE  demo        WITH  strategy_class  =  ‘NetworkTopologyStrategy’        AND    strategy_options:DC1  =  3        AND    strategy_options:DC2  =  2;
  81. 81. Topology Discovery SimpleSnitch (single datacenter) EC2Snitch (region as datancer, a. zone as rack) PropertyFileSnitch (cassandra-topology.properties) RackInferringSnitch (10.DataCenter.Rack.Node)
  82. 82. Property File Snitch      cassandra-­‐topology.properties  66.160.141.216  =  DC1:RAC1  66.160.141.217  =  DC1:RAC1  66.160.141.218  =  DC1:RAC1  174.129.20.82  =  DC2:RAC1  174.129.20.83  =  DC2:RAC1  174.129.30.60  =  DC2:RAC2  174.129.30.61  =  DC2:RAC2
  83. 83. Wide Rows (Composite Primary Key)  CREATE  TABLE  page_views  (          domain  varchar,            page  varchar,            hits  counter,            PRIMARY  KEY(domain,  page)  );  UPDATE  page_views    SET  hits  =  hits  +  1    WHERE  domain  =  www.google.com  and  page  =  /faq.html;
  84. 84. Wide Rows (Composite Primary Key)  CREATE  TABLE  metrics  (        name  text,          day  int,          value  counter,          PRIMARY  KEY  (name,  day)  );  UPDATE  metrics    SET  value  =  value  +  1    WHERE  name  =  google.com  AND  day  =  20121201;  SELECT  *    FROM  metrics    WHERE  day  >  20121201  AND  day  <  20121205                AND  name  =  ‘google.com’
  85. 85. Wide Rows (Composite Primary Key)  CREATE  TABLE  tweets  (        tweet_id  uuid  PRIMARY  KEY,        author  varchar,        body  varchar  );  CREATE  TABLE  timeline  (        user_id  varchar,        tweet_id  uuid,      //  uuid  with  time  as  prefix  timeuuid        author  varchar,        body  varchar,        PRIMARY  KEY  (user_id,  tweet_id)  );
  86. 86. Atomic Batches (1.2+)  BEGIN  BATCH  USING  CONSISTENCY  QUORUM          INSERT  INTO  tweets  (user_id,  tweet_id,  author,  body)            VALUES  (‘alankay’,  ...,  ‘alan  kay’,  ‘...’)            INSERT  INTO  timeline  (user_id,  tweet_id,  author,  body)            VALUES  (‘other’,  ‘...’,  ‘alankay’,  ‘...’)  APPLY  BATCH  CREATE  TABLE  batchlog  (          id  uuid  PRIMARY  KEY,          written_at  timestamp,          data  blob  )
  87. 87. Collections / Sets (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        emails  set<text>  );  INSERT  INTO  users  (login,  name,  emails)    VALUES  (‘alankay’,  ‘Alan  Kay’,  {  “alan@kay.com”  })    UPDATE  users    SET  emails  +  {  “a@b.com”  }    WHERE  login  =  ‘alankay’  
  88. 88. Collections / Maps (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        social_ids  map<text,  text>  );  INSERT  INTO  users  (login,  name,  social_ids)    VALUES  (‘alankay’,  ‘Alan  Kay’,  {  “twitter”  :  “alankay”  })    UPDATE  users    SET  social_ids[“google”]  =  “+alankay”    WHERE  login  =  ‘alankay’  
  89. 89. Collections / Lists (1.2+)  CREATE  TABLE  users  (        login  text  PRIMARY  KEY,        name  text,        creditcards  list<text>  );  INSERT  INTO  users  (login,  name,  creditcards)    VALUES  (‘alankay’,  ‘Alan  Kay’,  [  “1234-­‐”  ])    UPDATE  users    SET  creditcards  +  “2345-­‐”    WHERE  login  =  ‘alankay’  
  90. 90. Cassandra ClientsShells High Level APIsCassandra-­‐CLI Java:  Hector  Client  APICQLSH Java:  Astyanax  (Netflix)Drivers Scala:  Cassie  (Twitter)Java:  CQL  /  JDBC Python:  PyCassa  Client  API PHP:  PhpCassa  Client  APIMappingsJava:  Apache  Gora Low LevelJava:  Kundera  (JPA) Thrift  (multi  language)
  91. 91. Thanks, Fernando Rodriguez Olivera twitter:  @frodriguez      mail:  frodriguez  <at>  gmail.com website:  nosqlessentials.com Next course (Spanish only):Hadoop/HBase/Cassandra/MongoDB Buenos Aires, 18/19 Dec 2012: Registration: nosqlessentials.com

×