NoSQL databases are one of the most successful technologies of Big Data era. Database community has faced many challenges even beyond Big Data era and presented several solutions. Despite all achievements, distributed transactions with external consistency remain to be one of the hardest computer science problems.
11. Common Characteristics of NoSQL DBs
Non-Relational
Schemaless Cluster-Friendly
Weaker
Concurrency Model
12. Consistency
● Logical:
○ Bringing the DB from one consistent state to another consistent state
● Replication:
○ Propagation of write operation to various node
19. Eventual Consistency
● Update partitions independently
● Converged “Eventually”
● Probability voting on data value
● In practice: “last-write-win”
● Require conflict resolution
functionality in application layer
Source: DTBD
21. Convergence Algorithms
● Operational transformation (1989)
○ C. A. Ellis and S. J. Gibbs. 1989. Concurrency control in groupware systems. In Proceedings of the 1989
ACM SIGMOD international conference on Management of data (SIGMOD '89)
○ Last Usage in the
● Conflict-free Replicated Data Types (2011-Present)
○ Sakr, Sherif and Zomaya, Albert. Conflict-free Replicated Data Types (CRDT), Encyclopedia of Big Data
Technologies, Springer International Publishing, 2018
22. Example: State of the Value
Time
Time
{a, b} {a}
{a, b, c}
{a, c}
{a, c}{a, b}
24. CRTDs
● Decentralized solution
● Strong Eventual Consistency:
Once you have seen the same events, you are immediately in the same state.
25. Is it enough?
● CRTDs treat one’s transactions as debit and credit
● Deliberately inducing withdrawal races at ATMs
● Need “external consistency”
26. ● Mission-critical scalable relational DB service
○ “Spanner: Google’s Globally Distributed Database” in
ACM Transactions on Computer Systems (TOCS),
August 2013, page 22
● Lock-free distributed read transactions
● Scale out write transactions linearly
● Cross-node ACID transactions guaranteed
Google Spanner
27. Enabling Technologies
● Multi-shard transactions by a two-phase algorithm
● Updates to each shard in real-time order (serializability)
● Interval-based global time ensure external consistency
● Physical atomic clock synchronizes time within a small error bound
30. Alternative Solutions
● Querying a single physical clock:
○ “Large-scale incremental processing using distributed transactions and notifications”,
In Proceedings of the 9th USENIX conference on Operating systems design and implementation (OSDI'10).
● Avoiding physical clock by Calvin:
○ “Calvin: fast distributed transactions for partitioned database systems”,
In Proceedings of the 2012 ACM SIGMOD
31. Summary
● NoSQL database succeed as Big Data technologies
● Need for ACID transactions not vanished
● Distributed transactions with external consistency remained to be solved