Cassandra advanced course, in the spirit of most Computer Science undergraduate level 3rd year courses. This presentations explores some of the newer features of Apache (DataStax) Cassandra such as Concensus Algorithm Paxos and new datatypes Tuples and User Defined Types.
5. Planning your Data Model
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Start with Queries
Denormalize to Optimize
Planning for Concurrent Writes
Who is jbellis a subscriber of?
Blog entry will have a body, user and category.
10. Enter Paxos
Light Weight Transactions
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Prepares a proposal that is sent to a number of Acceptors.
Waits on a an acknowledgement (in form of promise) from
Acceptors.
Sends accept message to Quorum of Acceptors with new value
to commit.
Returns success? completion to client.
Determines if proposal is newer than what it has seen.
Acknowledges/agree with its own highest proposal value seen
AND the current value (of what is to be set).
Receive message to commit new value.
Accept and return on successful commit of value.
21. Did I mention…
We’re HIRING!
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
22. Did I mention…
We’re HIRING!
@VictorFAnjos
@Cassandra
@FicstarSoftware
@BrightlaneInc
Editor's Notes
Consistency - All nodes see the same data at the same time.
performing a read operation will return the value of the most recent write operation causing all nodes to return the same data
Availability - Every request gets a response on success/failure
every client gets a response, regardless of the state of any individual node in the system
Partition Tolerance - System continues to work despite message loss or partial failure
can sustain any amount of network failure that doesn't result in a failure of the entire network
data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages
How you want to access the data?
All of the use cases your application needs to support.
All lookups your application needs to do.
Note any ordering, filtering or grouping requirements.
Relational world → normalize to minimize redundancy
Smaller, well-structured tables with relationships (foreign keys)
Joins.
Cannot join multiple column families to satisfy a given query request.
Plan for one or more rows in a single column family for each query.
This sacrifices disk space and reduces the number of disk seeks.
Row key, a string of virtually unbounded length.
Cassandra does not enforce unique-ness.
Inserting a duplicate row key will upsert the columns contained in the insert statement rather than return a unique constraint violation.
Scenario:You have one bank account, with $100 left in it, and two bank cards.
When you try to withdraw money with the two cards (you and your wife) at the same time at 2 different ATMs, you might get 2 times $100…
PROBLEM!!!
Scenario:You have one bank account, with $100 left in it, and two bank cards.
When you try to withdraw money with the two cards (you and your wife) at the same time at 2 different ATMs, you might get 2 times $100…
PROBLEM!!!
One node acts as a proposer (initiates the protocol).
Only one node can act as proposer at a time, but if two or more choose to then the protocol will (typically) fail to terminate until only one node continues to act as proposer.
Sacrificing termination for correctness.
The other nodes (which conspire to make a decision about the value being proposed) are called ‘acceptors’.
Acceptors respond to proposals from the proposer either by rejecting them for some reason, or agreeing to them in principle and making promises in return about the proposals they will accept in the future.
These promises guarantee that proposals that may come from other proposers will not be erroneously accepted, and in particular they ensure that only the latest of the proposals sent by the proposer is accepted.
Proposer
Acceptors
‘Accept’ here means that an acceptor commits to a proposal as the one it considers definitive.
Once a majority of acceptors have accepted the same proposal, the Paxos protocol can terminate and the proposed value may be disseminated to nodes which are interested in it (these are called ‘listeners’).
Prepare/promise is the core of the algorithm.
Any node may propose a value; we call that node the leader.
The leader picks a ballot and sends it to the participating replicas.
If the ballot is the highest a replica has seen, it promises to not accept any proposals associated with any earlier ballot.
Along with that promise, it includes the most recent proposal it has already received.
If a majority of the nodes promise to accept the leader’s proposal, it may proceed to the actual proposal
but with the wrinkle that if a majority of replicas included an earlier proposal with their promise, then that is the value the leader must propose.
Conceptually, if a leader interrupts an earlier leader, it must first finish that leader’s proposal before proceeding with its own, thus giving us our desired linearizable behavior.
Thus, at the cost of four round trips, we can provide linearizability.
Prepare/promise is the core of the algorithm.
Any node may propose a value; we call that node the leader.
The leader picks a ballot and sends it to the participating replicas.
If the ballot is the highest a replica has seen, it promises to not accept any proposals associated with any earlier ballot.
Along with that promise, it includes the most recent proposal it has already received.
If a majority of the nodes promise to accept the leader’s proposal, it may proceed to the actual proposal
but with the wrinkle that if a majority of replicas included an earlier proposal with their promise, then that is the value the leader must propose.
Conceptually, if a leader interrupts an earlier leader, it must first finish that leader’s proposal before proceeding with its own, thus giving us our desired linearizable behavior.
Thus, at the cost of four round trips, we can provide linearizability.