18. Mysql-Cluster Isolation level is
READ_COMMITED
Isolation Level Dirty Read Fuzzy Read Phantom Read
Read uncommitted Possible Possible Possible
Read committed Not possible Possible Possible
Repeatable read
(Snapshot isolation)
Not possible Not possible Possible
Serializable Not possible Not possible Not possible
20. Snapshot Isolation
-----------------------------------------------------------------------------------------------------------------------
Algorithm: Snapshot-isolation schema
-----------------------------------------------------------------------------------------------------------------------
initially: snapshot.clear;
operation doOperation
tx.begin
snapshotting()
performTask()
tx.commit
operation snapshotting
foreach x in op do
snapshot <- tx.find(x.query)
operation performTask
//Operation Body, referring to cache for data
-----------------------------------------------------------------------------------------------------------------------
Consistent snapshot of data
Commit if no conflicting updates
No fuzzy read
Prevent modification conflict:
Optimistic
Pessimistic
21. Row level locking
-----------------------------------------------------------------------------------------------------------------------
Algorithm: Snapshot-isolation with row-level lock schema
-----------------------------------------------------------------------------------------------------------------------
initially: snapshot.clear;
operation doOperation
tx.begin
snapshotting()
performTask()
tx.commit
operation snapshotting
foreach x in op do
tx.lockLevel(x.lockType)
snapshot <- tx.find(x.query)
operation performTask
//Operation Body, referring to cache for data
-----------------------------------------------------------------------------------------------------------------------
Conflict prevention instead of resolution
Supported by Mysql-Cluster
Lock level affects parallelization factor
22. Maintaining HDFS Semantics
Does Snapshot + Lock ensure correctness of all HDFS operations? No
Independent mutations but semantically incorrect !!!
25. Total Order Locking
Total order rule:
Notations:
X= {x | x is a metadata object}
R = {r | r is a read operation}
W = {w | w is a write operation}
Serialization rule:
26. Complete Locking Solution
-----------------------------------------------------------------------------------------------------------------------
Algorithm: Snapshot-isolation with total ordered row-level lock schema
-----------------------------------------------------------------------------------------------------------------------
initially: snapshot.clear;
operation doOperation
tx.begin
snapshotting()
performTask()
tx.commit
operation snapshotting
S = total_order_sort(op.X)
foreach x in S do
if x is a parent then level = x.parent_level_lock
else level = x.strongest_lock_type
tx.lockLevel(level)
snapshot <- tx.find(x.query)
operation performTask
//Operation Body, referring to cache for data
-----------------------------------------------------------------------------------------------------------------------
Conflicting orders -> Total Order Locking
Lock upgrade -> Acquire strongest required lock-level
Semantically related -> Parent Lock
27. Total order of NameNode metadata
S
t
e
p
Metadata Objects
1 Directory#1(root)
2 Directory#2
..
.
...
n Directory#n
n
+
1
File
n
+
Block-Infos, Leases
28. Operations Implemented as Multi
transactions
Phase Ste
p
Metadata Objects
t1:No Lock 1 given-blocks
t2:Basic-Order 1 file
t2:Basic-Order 2 block
t2:Basic-Order 3 replicas, corrupted-replicas, excess-replicas,
under-replicated-block, pending-block,
replicas-under-construction, invalidated-blocks
29. Safety and Liveness
1.Single transaction operations
S: Fine grain serialization
L: Total-order lock + no lock upgrade
2. Multi transaction operations
S: No dependencies in group of
metadata
+ No mutation in 1st transaction
+ validation in 2nd transaction
L: Limited number of retries
Shared nothing, no single point of failure
More than a billion read and write per minutes
Horizontally scalable storage and throughput
Real-time transactions