SlideShare a Scribd company logo
1 of 28
Introduction
Scalable Atomic Visibility with
RAMP Transactions
Peter Bailis, Alan Fekete2, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica
UC Berkeley and University of Sydney2
Iskandar Setiadi
13511073
Advanced Distributed System
Institut Teknologi Bandung
April 21, 2015
April 21, 2015 1
Iskandar Setiadi
Introduction
Overview and Motivation
Semantic and System Model
RAMP Transaction Algorithms
Experimental Evaluation
Note: Due to time restriction, several additional
details and further optimizations are left as an
exercise for the reader.
Outline
April 21, 2015 2
Iskandar Setiadi
Transaction
A sequence of operations performed as a single
logical unit of work
Atomic Visible Transactional Access
Cases where all or none of each transaction’s
effects should be visible
If a transaction T1 writes x = 1 and y = 1, then
another transaction T2 should not read x = 1 and
y = null.
Introduction
April 21, 2015 3
Iskandar Setiadi
Scalability and Atomic Visibility
Many traditional transactional mechanisms use
two-phases locking and variants of optimistic
concurrency control to ensure the correctness of
transactions.
These algorithms are slow and, under failure,
unavailable in a distributed environment.
Current Problems
April 21, 2015 4
Iskandar Setiadi
Read Atomic Multi-Partition (RAMP)
This algorithm enforces atomic visibility while offering
excellent scalability, guaranteed commit despite partial
failures (via synchronization independence), and
minimized communication between servers (via
partition independence).
RAMP transactions allow reads to “race” writes: It can
autonomously detect the presence of non-atomic reads
and, if necessary, repair them via a second round of
communication with servers.
Read Atomic (RA) Isolation
April 21, 2015 5
Iskandar Setiadi
RAMP uses ACPs (Atomic Commitment Protocol)
with non-blocking concurrency control
mechanisms: individual transactions can stall due
to failures or communication delays without
forcing other transactions to stall.
Overview
April 21, 2015 6
Iskandar Setiadi
Facebook and LinkedIn
Espresso allow a user to
perform a “like” action on a
certain message / post.
Violations of atomic
visibility may surface as
broken bi-directional
relationship (friend
relationship in Facebook)
and dangling references.
Motivation: Foreign Key Constraints
April 21, 2015 7
Iskandar Setiadi
Secondary Indexing
Searching data via secondary attributes (e.g.
birth date) is challenging. In Cassandra and
Google Megastore, they allow local secondary
index, which requires contacting every partition
for secondary attribute lookups.
Materialized View Maintenance
Example: Mailbox “unread message counter”
Motivation (Cont.)
April 21, 2015 8
Iskandar Setiadi
Fractured Reads
A transaction Tj exhibits fractured reads if
transaction Ti writes versions xm and yn (in any
order, with x possibly but not necessarily equal to
y), Tj reads version xm and version yk, and k < n.
Read Atomic Isolation (RA) prevents fractured
read anomalies and also prevents transactions
from reading uncommited, aborted, or
intermediate data. (snapshot view)
Semantic and System Model
April 21, 2015 9
Iskandar Setiadi
RA does not prevent concurrent updates or
provide serial access to data items.
Example: RA cannot be used to maintain bank
account balances. RA is a better fit for the
“friend” operation.
RA Implications & Limitations
April 21, 2015 10
Iskandar Setiadi
Given specification for RA isolation and scalability, the
following example will focus on providing read-only and
write-only transactions with “last writer wins” overwrite
policy.
3 types:
1. RAMP-Fast (RAMP-F): metadata size is linear to
transaction size (not data size)
2. RAMP-Hybrid (RAMP-H): constant-factor metadata
3. RAMP-Small (RAMP-S): constant-factor metadata
RAMP Transaction Algorithms
April 21, 2015 11
Iskandar Setiadi
One RTT for reads
(stable), except for
partial reads
Two RTTs for writes
RAMP-Fast
April 21, 2015 12
Iskandar Setiadi
RAMP-Fast (Cont.)
April 21, 2015 13
Iskandar Setiadi
Write
In the PREPARE phase, each partition adds the
write to its local database.
In the COMMIT phase, each partition updates an
index containing the highest-timestamped
committed version of each item.
Read
 Fetching the last committed version for each
item and calculate whether it is “missing” any
versions.
RAMP-Fast (Cont.)
April 21, 2015 14
Iskandar Setiadi
RAMP-Fast (Algorithm)
April 21, 2015 15
Iskandar Setiadi
RAMP-S uses constant-size metadata but
always requires two RTT for reads.
First round of reads: fetch the highest
committed timestamp for each item from its
respective partition
Second round of reads: retrieve the highest-
timestamped version of the item that also
appears in the supplied set of timestamps
RAMP-Small
April 21, 2015 16
Iskandar Setiadi
RAMP-Small (Algorithm)
April 21, 2015 17
Iskandar Setiadi
RAMP-H Write: store a Bloom filter as the
metadata
RAMP-H Read: Same with RAMP-F, except this
algorithm computes a list of potentially higher-
timestamped writes for each item from the
Bloom filter. Any potentially missing versions
are fetched in a second round of reads.
RAMP-Hybrid
April 21, 2015 18
Iskandar Setiadi
Bloom Filter
April 21, 2015 19
Iskandar Setiadi
Safety Properties
Bloom filter may result in false positive.
In the appendix, it’s proven that any false positive
will not compromise the integrity of the result
set; with unique timestamps, any reads due to
false positive will return null.
RAMP-Hybrid (Cont.)
April 21, 2015 20
Iskandar Setiadi
RAMP-Hybrid (Algorithm)
April 21, 2015 21
Iskandar Setiadi
Summary of Basic Algorithms
April 21, 2015 22
Iskandar Setiadi
RAMP-F, RAMP-H, and often RAMP-S outperform
existing solutions across a range of workload
conditions while exhibiting overheads typically
within 8% and no more than 48% of peak
throughput.
Each algorithm is evaulated using YCSB
benchmark and several cr1.8xlarge instances on
Amazon EC2 with a 95% read and 5% write
proportion.
Experimental Evaluation
April 21, 2015 23
Iskandar Setiadi
LWLR: Long write locks and long read locks, providing
Repeatable Read Isolation (PL-2.99)
LWSR: Long write locks with short read locks,
providing Read Committed Isolation (PL-2L, ≠ RA)
LWNR: Long write with no read locks, providing Read
Uncommitted Isolation (≠ RA)
NWNR: No locks, base performance for parallelized
operations
E-PCI: Eiger system’s 2PC-PCI, where for each
transaction, designated “coordinator” server enforce
RA isolation
Notation
April 21, 2015 24
Iskandar Setiadi
Result
April 21, 2015 25
Iskandar Setiadi
Result (Cont.)
April 21, 2015 26
Iskandar Setiadi
Cooperative Termination Protocol (CTP)
Several transactions may become stalled operations. To
“free” these leaks, CTP is used.
In the real environment, the blocked operations should
occur with a modest failure rate of 1 in 1000 writes.
Thus, the average-case overheads are small.
CTP Reference: P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency
control and recovery in database systems. Addison-wesley New York, 1987.
Experimental: CTP Overhead
April 21, 2015 27
Iskandar Setiadi
With 100 servers (in
several availability
zone” of EC2), RAMP-F
was within 2.6%,
RAMP-H within 3.4%,
RAMP-S was within
45% of NWNR.
Experimental: Scalability
April 21, 2015 28

More Related Content

Similar to Review - Scalable Atomic Visibility with RAMP Transactions

Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaData Con LA
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed DebuggingAnant Narayanan
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithmmuayyad alsadi
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Engr. Md. Jamal Uddin Rayhan
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Troubleshooting for Intent-based Networking
Troubleshooting for Intent-based NetworkingTroubleshooting for Intent-based Networking
Troubleshooting for Intent-based NetworkingOpen Networking Summit
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Open Networking through Programmability
Open Networking through ProgrammabilityOpen Networking through Programmability
Open Networking through ProgrammabilityTal Lavian Ph.D.
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"Hideyuki Kawashima
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSKathirvel Ayyaswamy
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAUlf Wendel
 

Similar to Review - Scalable Atomic Visibility with RAMP Transactions (20)

Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7Jack Gudenkauf sparkug_20151207_7
Jack Gudenkauf sparkug_20151207_7
 
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­ticaA noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
A noETL Parallel Streaming Transformation Loader using Spark, Kafka­ & Ver­tica
 
Transition scope
Transition scopeTransition scope
Transition scope
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed Debugging
 
Research work - V2V
Research work - V2VResearch work - V2V
Research work - V2V
 
Introduction to Raft algorithm
Introduction to Raft algorithmIntroduction to Raft algorithm
Introduction to Raft algorithm
 
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
Combined Bank Question Solution(Updated) 25/10/2021 Assistant Hardware Engine...
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Troubleshooting for Intent-based Networking
Troubleshooting for Intent-based NetworkingTroubleshooting for Intent-based Networking
Troubleshooting for Intent-based Networking
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Open Networking through Programmability
Open Networking through ProgrammabilityOpen Networking through Programmability
Open Networking through Programmability
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
 
Ikc 2015
Ikc 2015Ikc 2015
Ikc 2015
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Matopt
MatoptMatopt
Matopt
 
Algorithm
AlgorithmAlgorithm
Algorithm
 
MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"MCSoC'13 Keynote Talk "Taming Big Data Streams"
MCSoC'13 Keynote Talk "Taming Big Data Streams"
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HA
 

Recently uploaded

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

Review - Scalable Atomic Visibility with RAMP Transactions

  • 1. Introduction Scalable Atomic Visibility with RAMP Transactions Peter Bailis, Alan Fekete2, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica UC Berkeley and University of Sydney2 Iskandar Setiadi 13511073 Advanced Distributed System Institut Teknologi Bandung April 21, 2015 April 21, 2015 1
  • 2. Iskandar Setiadi Introduction Overview and Motivation Semantic and System Model RAMP Transaction Algorithms Experimental Evaluation Note: Due to time restriction, several additional details and further optimizations are left as an exercise for the reader. Outline April 21, 2015 2
  • 3. Iskandar Setiadi Transaction A sequence of operations performed as a single logical unit of work Atomic Visible Transactional Access Cases where all or none of each transaction’s effects should be visible If a transaction T1 writes x = 1 and y = 1, then another transaction T2 should not read x = 1 and y = null. Introduction April 21, 2015 3
  • 4. Iskandar Setiadi Scalability and Atomic Visibility Many traditional transactional mechanisms use two-phases locking and variants of optimistic concurrency control to ensure the correctness of transactions. These algorithms are slow and, under failure, unavailable in a distributed environment. Current Problems April 21, 2015 4
  • 5. Iskandar Setiadi Read Atomic Multi-Partition (RAMP) This algorithm enforces atomic visibility while offering excellent scalability, guaranteed commit despite partial failures (via synchronization independence), and minimized communication between servers (via partition independence). RAMP transactions allow reads to “race” writes: It can autonomously detect the presence of non-atomic reads and, if necessary, repair them via a second round of communication with servers. Read Atomic (RA) Isolation April 21, 2015 5
  • 6. Iskandar Setiadi RAMP uses ACPs (Atomic Commitment Protocol) with non-blocking concurrency control mechanisms: individual transactions can stall due to failures or communication delays without forcing other transactions to stall. Overview April 21, 2015 6
  • 7. Iskandar Setiadi Facebook and LinkedIn Espresso allow a user to perform a “like” action on a certain message / post. Violations of atomic visibility may surface as broken bi-directional relationship (friend relationship in Facebook) and dangling references. Motivation: Foreign Key Constraints April 21, 2015 7
  • 8. Iskandar Setiadi Secondary Indexing Searching data via secondary attributes (e.g. birth date) is challenging. In Cassandra and Google Megastore, they allow local secondary index, which requires contacting every partition for secondary attribute lookups. Materialized View Maintenance Example: Mailbox “unread message counter” Motivation (Cont.) April 21, 2015 8
  • 9. Iskandar Setiadi Fractured Reads A transaction Tj exhibits fractured reads if transaction Ti writes versions xm and yn (in any order, with x possibly but not necessarily equal to y), Tj reads version xm and version yk, and k < n. Read Atomic Isolation (RA) prevents fractured read anomalies and also prevents transactions from reading uncommited, aborted, or intermediate data. (snapshot view) Semantic and System Model April 21, 2015 9
  • 10. Iskandar Setiadi RA does not prevent concurrent updates or provide serial access to data items. Example: RA cannot be used to maintain bank account balances. RA is a better fit for the “friend” operation. RA Implications & Limitations April 21, 2015 10
  • 11. Iskandar Setiadi Given specification for RA isolation and scalability, the following example will focus on providing read-only and write-only transactions with “last writer wins” overwrite policy. 3 types: 1. RAMP-Fast (RAMP-F): metadata size is linear to transaction size (not data size) 2. RAMP-Hybrid (RAMP-H): constant-factor metadata 3. RAMP-Small (RAMP-S): constant-factor metadata RAMP Transaction Algorithms April 21, 2015 11
  • 12. Iskandar Setiadi One RTT for reads (stable), except for partial reads Two RTTs for writes RAMP-Fast April 21, 2015 12
  • 14. Iskandar Setiadi Write In the PREPARE phase, each partition adds the write to its local database. In the COMMIT phase, each partition updates an index containing the highest-timestamped committed version of each item. Read  Fetching the last committed version for each item and calculate whether it is “missing” any versions. RAMP-Fast (Cont.) April 21, 2015 14
  • 16. Iskandar Setiadi RAMP-S uses constant-size metadata but always requires two RTT for reads. First round of reads: fetch the highest committed timestamp for each item from its respective partition Second round of reads: retrieve the highest- timestamped version of the item that also appears in the supplied set of timestamps RAMP-Small April 21, 2015 16
  • 18. Iskandar Setiadi RAMP-H Write: store a Bloom filter as the metadata RAMP-H Read: Same with RAMP-F, except this algorithm computes a list of potentially higher- timestamped writes for each item from the Bloom filter. Any potentially missing versions are fetched in a second round of reads. RAMP-Hybrid April 21, 2015 18
  • 20. Iskandar Setiadi Safety Properties Bloom filter may result in false positive. In the appendix, it’s proven that any false positive will not compromise the integrity of the result set; with unique timestamps, any reads due to false positive will return null. RAMP-Hybrid (Cont.) April 21, 2015 20
  • 22. Iskandar Setiadi Summary of Basic Algorithms April 21, 2015 22
  • 23. Iskandar Setiadi RAMP-F, RAMP-H, and often RAMP-S outperform existing solutions across a range of workload conditions while exhibiting overheads typically within 8% and no more than 48% of peak throughput. Each algorithm is evaulated using YCSB benchmark and several cr1.8xlarge instances on Amazon EC2 with a 95% read and 5% write proportion. Experimental Evaluation April 21, 2015 23
  • 24. Iskandar Setiadi LWLR: Long write locks and long read locks, providing Repeatable Read Isolation (PL-2.99) LWSR: Long write locks with short read locks, providing Read Committed Isolation (PL-2L, ≠ RA) LWNR: Long write with no read locks, providing Read Uncommitted Isolation (≠ RA) NWNR: No locks, base performance for parallelized operations E-PCI: Eiger system’s 2PC-PCI, where for each transaction, designated “coordinator” server enforce RA isolation Notation April 21, 2015 24
  • 27. Iskandar Setiadi Cooperative Termination Protocol (CTP) Several transactions may become stalled operations. To “free” these leaks, CTP is used. In the real environment, the blocked operations should occur with a modest failure rate of 1 in 1000 writes. Thus, the average-case overheads are small. CTP Reference: P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency control and recovery in database systems. Addison-wesley New York, 1987. Experimental: CTP Overhead April 21, 2015 27
  • 28. Iskandar Setiadi With 100 servers (in several availability zone” of EC2), RAMP-F was within 2.6%, RAMP-H within 3.4%, RAMP-S was within 45% of NWNR. Experimental: Scalability April 21, 2015 28