Review - Scalable Atomic Visibility with RAMP Transactions
1. Introduction
Scalable Atomic Visibility with
RAMP Transactions
Peter Bailis, Alan Fekete2, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
UC Berkeley and University of Sydney2
Iskandar Setiadi
13511073
Advanced Distributed System
Institut Teknologi Bandung
April 21, 2015
April 21, 2015 1
2. Iskandar Setiadi
Introduction
Overview and Motivation
Semantic and System Model
RAMP Transaction Algorithms
Experimental Evaluation
Note: Due to time restriction, several additional
details and further optimizations are left as an
exercise for the reader.
Outline
April 21, 2015 2
3. Iskandar Setiadi
Transaction
A sequence of operations performed as a single
logical unit of work
Atomic Visible Transactional Access
Cases where all or none of each transaction’s
effects should be visible
If a transaction T1 writes x = 1 and y = 1, then
another transaction T2 should not read x = 1 and
y = null.
Introduction
April 21, 2015 3
4. Iskandar Setiadi
Scalability and Atomic Visibility
Many traditional transactional mechanisms use
two-phases locking and variants of optimistic
concurrency control to ensure the correctness of
transactions.
These algorithms are slow and, under failure,
unavailable in a distributed environment.
Current Problems
April 21, 2015 4
5. Iskandar Setiadi
Read Atomic Multi-Partition (RAMP)
This algorithm enforces atomic visibility while offering
excellent scalability, guaranteed commit despite partial
failures (via synchronization independence), and
minimized communication between servers (via
partition independence).
RAMP transactions allow reads to “race” writes: It can
autonomously detect the presence of non-atomic reads
and, if necessary, repair them via a second round of
communication with servers.
Read Atomic (RA) Isolation
April 21, 2015 5
6. Iskandar Setiadi
RAMP uses ACPs (Atomic Commitment Protocol)
with non-blocking concurrency control
mechanisms: individual transactions can stall due
to failures or communication delays without
forcing other transactions to stall.
Overview
April 21, 2015 6
7. Iskandar Setiadi
Facebook and LinkedIn
Espresso allow a user to
perform a “like” action on a
certain message / post.
Violations of atomic
visibility may surface as
broken bi-directional
relationship (friend
relationship in Facebook)
and dangling references.
Motivation: Foreign Key Constraints
April 21, 2015 7
8. Iskandar Setiadi
Secondary Indexing
Searching data via secondary attributes (e.g.
birth date) is challenging. In Cassandra and
Google Megastore, they allow local secondary
index, which requires contacting every partition
for secondary attribute lookups.
Materialized View Maintenance
Example: Mailbox “unread message counter”
Motivation (Cont.)
April 21, 2015 8
9. Iskandar Setiadi
Fractured Reads
A transaction Tj exhibits fractured reads if
transaction Ti writes versions xm and yn (in any
order, with x possibly but not necessarily equal to
y), Tj reads version xm and version yk, and k < n.
Read Atomic Isolation (RA) prevents fractured
read anomalies and also prevents transactions
from reading uncommited, aborted, or
intermediate data. (snapshot view)
Semantic and System Model
April 21, 2015 9
10. Iskandar Setiadi
RA does not prevent concurrent updates or
provide serial access to data items.
Example: RA cannot be used to maintain bank
account balances. RA is a better fit for the
“friend” operation.
RA Implications & Limitations
April 21, 2015 10
11. Iskandar Setiadi
Given specification for RA isolation and scalability, the
following example will focus on providing read-only and
write-only transactions with “last writer wins” overwrite
policy.
3 types:
1. RAMP-Fast (RAMP-F): metadata size is linear to
transaction size (not data size)
2. RAMP-Hybrid (RAMP-H): constant-factor metadata
3. RAMP-Small (RAMP-S): constant-factor metadata
RAMP Transaction Algorithms
April 21, 2015 11
12. Iskandar Setiadi
One RTT for reads
(stable), except for
partial reads
Two RTTs for writes
RAMP-Fast
April 21, 2015 12
14. Iskandar Setiadi
Write
In the PREPARE phase, each partition adds the
write to its local database.
In the COMMIT phase, each partition updates an
index containing the highest-timestamped
committed version of each item.
Read
Fetching the last committed version for each
item and calculate whether it is “missing” any
versions.
RAMP-Fast (Cont.)
April 21, 2015 14
16. Iskandar Setiadi
RAMP-S uses constant-size metadata but
always requires two RTT for reads.
First round of reads: fetch the highest
committed timestamp for each item from its
respective partition
Second round of reads: retrieve the highest-
timestamped version of the item that also
appears in the supplied set of timestamps
RAMP-Small
April 21, 2015 16
18. Iskandar Setiadi
RAMP-H Write: store a Bloom filter as the
metadata
RAMP-H Read: Same with RAMP-F, except this
algorithm computes a list of potentially higher-
timestamped writes for each item from the
Bloom filter. Any potentially missing versions
are fetched in a second round of reads.
RAMP-Hybrid
April 21, 2015 18
20. Iskandar Setiadi
Safety Properties
Bloom filter may result in false positive.
In the appendix, it’s proven that any false positive
will not compromise the integrity of the result
set; with unique timestamps, any reads due to
false positive will return null.
RAMP-Hybrid (Cont.)
April 21, 2015 20
23. Iskandar Setiadi
RAMP-F, RAMP-H, and often RAMP-S outperform
existing solutions across a range of workload
conditions while exhibiting overheads typically
within 8% and no more than 48% of peak
throughput.
Each algorithm is evaulated using YCSB
benchmark and several cr1.8xlarge instances on
Amazon EC2 with a 95% read and 5% write
proportion.
Experimental Evaluation
April 21, 2015 23
24. Iskandar Setiadi
LWLR: Long write locks and long read locks, providing
Repeatable Read Isolation (PL-2.99)
LWSR: Long write locks with short read locks,
providing Read Committed Isolation (PL-2L, ≠ RA)
LWNR: Long write with no read locks, providing Read
Uncommitted Isolation (≠ RA)
NWNR: No locks, base performance for parallelized
operations
E-PCI: Eiger system’s 2PC-PCI, where for each
transaction, designated “coordinator” server enforce
RA isolation
Notation
April 21, 2015 24
27. Iskandar Setiadi
Cooperative Termination Protocol (CTP)
Several transactions may become stalled operations. To
“free” these leaks, CTP is used.
In the real environment, the blocked operations should
occur with a modest failure rate of 1 in 1000 writes.
Thus, the average-case overheads are small.
CTP Reference: P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency
control and recovery in database systems. Addison-wesley New York, 1987.
Experimental: CTP Overhead
April 21, 2015 27
28. Iskandar Setiadi
With 100 servers (in
several availability
zone” of EC2), RAMP-F
was within 2.6%,
RAMP-H within 3.4%,
RAMP-S was within
45% of NWNR.
Experimental: Scalability
April 21, 2015 28