• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Introduction to Cassandra

  • 2,558 views
Uploaded on

Recent talk I gave at the Wellington Rails User Group. …

Recent talk I gave at the Wellington Rails User Group.

I tried to build up the model of how and why Cassandra does things.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,558
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
82
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to CassandraWellington Ruby on Rails User Group Aaron Morton @aaronmorton 24/11/2010
  • 2. Disclaimer.This is an introduction not a reference.
  • 3. I may, from time to timeand for the best possible reasons, bullshit you.
  • 4. What do you already know about Cassandra?
  • 5. Get ready.
  • 6. The next slide has a lot on it.
  • 7. Cassandra is a distributed, fault tolerant, scalable, column oriented data store.
  • 8. A word about “column oriented”.
  • 9. Relax.
  • 10. It’s different to a roworiented DB like MySQL. So...
  • 11. For now, think about keys and values. Where each value is a hash / dict.
  • 12. Cassandra’s data model and on disk storage are based on the Google Bigtable paper from 2006.
  • 13. The distributed cluster design is based on theAmazon Dynamo paper from 2007.
  • 14. {‘foo’ => {‘bar’ => ‘baz’,},} {key => {col_name => col_value,},}
  • 15. Easy.Lets store ‘foo’ somewhere.
  • 16. foo
  • 17. But I want to be able toread it back if one machine fails.
  • 18. Lets distribute it on 3 of the 5 nodes I have.
  • 19. This is the Replication Factor. Called RF or N.
  • 20. Each node has a token thatidentifies the upper value of the key range it is responsible for.
  • 21. #1 <= E #5 #2<= Z <= J #4 #3<= T <= O
  • 22. Client connects to arandom node and asks it tocoordinate storing the ‘foo’ key.
  • 23. Each node knows about allother nodes in the cluster, including their tokens.
  • 24. This is achieved using a Gossip protocol. Every second each node sharesit’s full view of the cluster with 1 to 3 other nodes.
  • 25. Our coordinator is node 5. It knows node 2 is responsible for the ‘foo’ key.
  • 26. #1Client <= E #5 #2<= Z foo #4 #3<= T <= O
  • 27. But there is a problem...
  • 28. What if we have lots ofvalues between F and J?
  • 29. We end up with a “hot” section in our ring of nodes.
  • 30. That’s bad mmmkay?
  • 31. You shouldnt have a hot section in your ring. mmmkay?
  • 32. A Partitioner is used to apply a transform to thekey. The transformed values are also used to define a nodes’ range.
  • 33. The Random Partitioner applies a MD5 transform. The range of all possiblekeys values is changed to a 128 bit number.
  • 34. There are other Partitioners, such as theOrder Preserving Partition.But start with the Random Partitioner.
  • 35. Let’s pretend all keys are now transformed to aninteger between 0 and 9.
  • 36. Our 5 node cluster now looks like.
  • 37. #1 <= 2 #5 #2<= 0 <= 4 #4 #3<= 8 <= 6
  • 38. Pretend our ‘foo’ key transforms to 3.
  • 39. #1Client <= 2 #5 #2<= 0 "3" #4 #3<= 8 <= 6
  • 40. Good start.
  • 41. But where are the replicas? We want to replicate the ‘foo’ key 3 times.
  • 42. A Replication Strategy is used to determine whichnodes should store replicas.
  • 43. It’s also used to work outwhich nodes should have a value when reading.
  • 44. Simple Strategy orders the nodes by their token andplaces the replicas around the ring.
  • 45. Network Topology Strategy is aware of the racks andData Centres your servers are in. Can split replicas between DC’s.
  • 46. Simple Strategy will do in most cases.
  • 47. Our coordinator will sendthe write to all 3 nodes at once.
  • 48. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "3" "3"
  • 49. Once the 3 replicas tell the coordinator they havefinished, it will tell the client the write completed.
  • 50. Done.Let’s go home.
  • 51. Hang on.What about fault tolerant?What if node #4 is down?
  • 52. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "3" "3"
  • 53. The client must specify aConsistency Level for each operation.
  • 54. Consistency Level specifies how many nodes mustagree before the operation is a success.
  • 55. For reads is known as R.For writes is known as W.
  • 56. Here are the simple ones(there are a few more)...
  • 57. One.The coordinator will only wait for one node to acknowledge the write.
  • 58. Quorum.N/2 + 1
  • 59. All.
  • 60. The cluster will work toeventually make all copies of the data consistent.
  • 61. To get consistent behaviourmake sure that R + W > N. You can do this by...
  • 62. Always using Quorum for read and writes. Or...
  • 63. Use All for writes and One for reads. Or...
  • 64. Use All for reads and One for writes.
  • 65. Try our write again, usingQuorum consistency level.
  • 66. Coordinator will wait for 2 nodes to complete the write before telling the client has completed.
  • 67. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "3" "3"
  • 68. What about when node 4 comes online?
  • 69. It will not have our “foo” key.
  • 70. Won’t somebody pleasethink of the “foo” key!?
  • 71. During our write the coordinator will send aHinted Handoff to one of the online replicas.
  • 72. Hinted Handoff tells the node that one of the replicas was down andneeds to be updated later.
  • 73. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "3" "3" send "3" to #4
  • 74. When node 4 comes backup, node 3 will eventually process the Hinted Handoffs and send the “foo” key to it.
  • 75. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "3" "3"
  • 76. What if the “foo” key isread before the Hinted Handoff is processed?
  • 77. #1Client <= 2 #5 #2<= 0 "3" #4 #3 "" "3" send "3" to #4
  • 78. At our Quorum CL the coordinator asks all nodesthat should have replicas to perform the read.
  • 79. Once CL nodes havereturned, their values are compared.
  • 80. If the do not match a ReadRepair process is kicked off.
  • 81. A timestamp provided bythe client during the write is used to determine the “latest” value.
  • 82. The “foo” key is written to node 4, and consistency achieved, before thecoordinator returns to the client.
  • 83. At lower CL the ReadRepair happens in the background and is probabilistic.
  • 84. We can force Cassandra torepair everything using the Anti Entropy feature.
  • 85. Anti Entropy is the main feature for achievingconsistency. RR and HH are optimisations.
  • 86. Anti Entropy startedmanually via command line or Java JMX.
  • 87. Great so far.
  • 88. But ratemylolcats.com is going to be huge.How do I store 100 Million pictures of cats?
  • 89. Add more nodes.
  • 90. More disk capacity, disk IO,memory, CPU, network IO. More everything.
  • 91. Linear scaling.
  • 92. Clusters of 100+ TB.
  • 93. And now for the data model.
  • 94. From the outside in.
  • 95. A Keyspace is the container for everything in your application.
  • 96. Keyspaces can be thought of as Databases.
  • 97. A Column Family is acontainer for ordered and indexed Columns.
  • 98. Columns have a name, value, and timestampprovided by the client.
  • 99. The CF indexes the columns by name andsupports get operations by name.
  • 100. CF’s do not define whichcolumns can be stored in them.
  • 101. Column Families have alarge memory overhead.
  • 102. You typically have few (<10) CF’s in your Keyspace. But there is no limit.
  • 103. We have Rows.Rows have a key.
  • 104. Rows store columns in oneor more Column Families.
  • 105. Different rows can storedifferent columns in the same Column Family.
  • 106. User CF username => fredkey => fred d_o_b => 04/03 username => bobkey => bob city => wellington
  • 107. A key can store different columns in different Column Families.
  • 108. User CF username => fredkey => fred d_o_b => 04/03 Timeline CF 09:01 => tweet_60key => fred 09:02 => tweet_70
  • 109. Here comes the SuperColumn Family to ruin it all.
  • 110. Arrgggghhhhh.
  • 111. A Super Column Family is acontainer for ordered and indexes Super Columns.
  • 112. A Super Column has aname and an ordered and indexed list of Columns.
  • 113. So the Super ColumnFamily just gives another level to our hash.
  • 114. Social Super CFkey => fred following => { bob => 01/01/2010, tom => 01/02/2010} followers => { bob => 01/01/2010}
  • 115. How about some code?