Scalar DL is a scalable and practical Byzantine fault detection middleware for transactional database systems that achieves correctness, scalability, and database agnosticism. This is a slide deck presented at VLDB'22.
For more details about Scalar DL, please check out the paper and our GitHub site.
- https://dl.acm.org/doi/abs/10.14778/3523210.3523212
- https://github.com/scalar-labs/scalardl
%in Harare+277-882-255-28 abortion pills for sale in Harare
Scalar DL: Scalable and Practical Byzantine Fault Detection for Transactional Database Systems (VLDB'22)
1. Scalar DL: Scalable and Practical Byzantine Fault Detection
for Transactional Database Systems
Hiroyuki Yamada, Jun Nemoto
Scalar, Inc.
2. Towards a reliable database system
● We live in a data-driven / data-centric world.
○ Data needs to be reliable and trustful.
○ Database systems need to be reliable and trustful.
● Dealing with Byzantine faults in a database system is one of the key factors.
○ Byzantine faults: software errors, data tampering, (internal) malicious attacks.
Our Goal: A database system that deals with Byzantine faults in a practical and
scalable way.
3. Dealing with Byzantine faults
● Basic principle: find discrepancies between replicas.
● Byzantine fault tolerance (BFT).
○ N > 3f, N: # of replicas, f: # of faulty replicas.
○ SMR: PBFT [OSDI’99], BFT-SMaRt [DSN’14], HotStuff [PODC’19] …
○ Database: HRDB [SOSP’07], Byzantium [EuroSys’11], Hyperledger fabric
[EuroSys’18], Basil [SOSP’21]
● Byzantine fault detection (BFD).
○ N > f, N: # of replicas, f: # of faulty replicas.
○ SMR: PeerReview [SOSP’07]
Are existing solutions practical and scalable enough for a database system?
4. BFT is ideal, but may not be practical for database systems
● At least 4 administrative domains (ADs) are required for correctness.
○ Malicious attacks are likely to be dependent in an AD.
● BFT might not fit well with enterprise database systems.
○ Many enterprise database systems are managed by a single AD or a few ADs.
An AD is a collection of
nodes and networks
operated by a single
organization or
administrative authority.
5. BFT is ideal, but may not be practical for database systems
● At least 4 administrative domains (ADs) are required for correctness.
○ Malicious attacks are likely to be dependent in an AD.
● BFT might not fit well with enterprise database systems.
○ Many enterprise database systems are managed by a single AD or a few ADs.
An AD is a collection of
nodes and networks
operated by a single
organization or
administrative authority.
6. BFT is ideal, but may not be practical for database systems
● At least 4 administrative domains (ADs) are required for correctness.
○ Malicious attacks are likely to be dependent in an AD.
● BFT might not fit well with enterprise database systems.
○ Many enterprise database systems are managed by a single AD or a few ADs.
AD-1
AD-2
AD-3
AD-4
An AD is a collection of
nodes and networks
operated by a single
organization or
administrative authority.
7. BFT is ideal, but may not be practical for database systems
● At least 4 administrative domains (ADs) are required for correctness.
○ Malicious attacks are likely to be dependent in an AD.
● BFT might not fit well with enterprise database systems.
○ Many enterprise database systems are managed by a single AD or a few ADs.
AD-1
AD-2
AD-3
AD-4
4 ADs is at least
required to mask
1 fault.
An AD is a collection of
nodes and networks
operated by a single
organization or
administrative authority.
8. BFD is a promising approach for database systems
● Require only 2 ADs for correctness.
○ 2 is the lower bound for the number of replicas in dealing with Byzantine faults.
● Many use cases that require only BFD or tamper evidence.
○ Regulations on data protection and privacy (e.g., GDPR and CCPA), prior user
right for IP, and vehicle regulations around software updates with OTA in WP.29.
● Existing solutions are not designed for transactional database systems.
○ Cannot run transactions in parallel (i.e., not scalable)
1 faulty AD can be
detected as long as
there are 2 ADs.
AD-1 AD-2
9. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
No existing work
10. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
No existing work
Not practical from an administrative perspective
11. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
No existing work
Not practical from an administrative perspective
Not designed
for database
transactions
12. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
No existing work
Not practical from an administrative perspective
Not designed
for database
transactions
13. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
No existing work
Not practical from an administrative perspective
Not designed
for database
transactions
14. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
15. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
16. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
AD-1 AD-2
17. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
AD-1 AD-2
18. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
AD-1 AD-2
19. BFT DB => BFD DB
● Can we realize BFD by splitting up replicas into 2 ADs?
○ No.
● 1 Byzantine-faulty replica will exceed the predefined threshold for correctness
because Byzantine faults are dependent in an AD.
○ Need to accept the fault, i.e., data will be tampered.
BFT DB cannot
trivially be extended
to realize BFD DB
AD-1 AD-2
N=4, f=2 => N>3f
20. BFD SMR => BFD DB
● Can we make BFD SMR (PeerReview) run transactions concurrently?
○ Yes, but only partially.
○ We could apply a concurrency control in a primary-side processing.
● Require sequential execution of hash-chained log in a witness-side for
correctness (i.e., strict serializability), which limits the overall scalability.
○ Running transactions in parallel could cause time-travel anomalies.
AD-1 AD-2
T1
T2
T2
T1
hash-chained log
Primary Witness (Auditor)
Witness-side execution has to
be sequential for correctness.
21. Challenge:
Scalable BFD for a database system deployed to a 2-AD environment
BFT BFD
SMR
(run transactions
sequentially)
DB
(run transactions
concurrently)
BFT SMR
PBFT, BFT-SMaRt,
HotStuff, Tendermint
BFD SMR
PeerReview
BFT DB
HRDB, Byzantium, Basil,
Hyperledger Fabric
BFD DB
NONE
Not possible
(as it is)
Possible but
not scalable
22. Scalar DL: A scalable and practical BFD approach
● Scalable and practical BFD middleware for transactional database systems.
○ Manage two types of servers and databases in separate ADs internally.
○ Database-agnostic by depending only on common database operations.
● Execute non-conflicting transactions in parallel while guaranteeing correctness.
Primary Secondary
Scalar DL Primary Servers
Primary Database
AD1
Scalar DL Clients
Applications
Scalar DL Secondary Servers
Secondary Database
AD2
Database System
• Provide safety (strict serializability)
and liveness if no fault.
• Provide safety (correct clients can
detect a Byzantine fault) if one AD
is faulty.
Correctness:
23. The BFD protocol - Overview
● Key idea: Make an agreement on the partial ordering of transactions in a
decentralized and concurrent way
○ Either primary or secondary cannot selfishly order/commit transactions.
● 3-phase protocol: Ordering -> Commit -> Validation.
○ The protocol assumes one-shot request model.
Client
Secondary
Primary
Ordering Commit Validation
24. The BFD protocol - Ordering phase
● Order transactions in a strict serializable manner with a variant of 2PL.
○ Simulate a transaction and identify the read/write sets of the transaction.
○ Acquire R/W locks using underlying database’s linearizable operations.
○ Go to the commit phase once all the required locks are acquired.
● Why not using multi-version concurrency control (MVCC)?
○ A primary and a secondary could derive different serialization orders without sharing explicit
order dependencies (e.g., conflict graph).
Primary key Version Lock count Lock mode
Lock holders
(TxIDs)
Input
dependencies
Lock entry:
A set of
<primary-key, version>.
Client
Secondary
Primary
Ordering Commit Validation
Indicate the
partial order of
transactions
25. The BFD protocol - Commit phase
● Execute transactions in an ACID way in an arbitrary order.
○ Also write a transaction status with a transaction ID as a key for recovery.
○ This is where a transaction is regarded as committed or aborted.
● Create proofs that indicate what records are read and written.
● The input dependencies indicate the partial order of transactions
Primary key Version TxID
Input
dependencies
MAC
Proof entry:
Client
Secondary
Primary
Ordering Commit Validation
Indicate the
partial order of
transactions
26. The BFD protocol - Validation phase
● Validate if the commit order is the same as the one the secondary expects.
○ Compare the lock entries and proofs.
● Execute transactions in the secondary once validated and create proofs.
● A client compares the results and proofs from the primary and the secondary
to find discrepancies (i.e., Byzantine faults).
Primary Secondary
Result
Proofs
Result
Proofs
2. Commit phase
3. Validation phase
Compare
=?
Compare
lock table
=?
Pre-validation
Client
Client
Secondary
Primary
Ordering Commit Validation
27. Evaluation - Benchmarked systems and workloads
● Benchmarked Systems:
○ PeerReviewTx: an extended version of PeerReview, which runs TXs in parallel in
a primary side.
○ Scalar DL: use Scalar DB to execute transactions on non-transactional databases.
○ Both PeerReviewTx and Scalar DL servers are placed in database instances.
○ PostgreSQL and Cassandra as backend database systems.
● Workloads
○ YCSB: F and C. 100M records with 100 bytes payload and uniform distribution.
○ TPC-C: 50/50 ratio of NewOrder and Payment. 100 - 1000 warehouses.
28. Evaluation - Experimental setup
● Environment
○ AWS. c5d.4xlarge for each database instance (8 cores, 32GB DRAM, NVMe SSD).
c5.9xlarge for a client.
○ 2 ADs in different VPCs.
PostgreSQL
Scalar DL
C*
DL
…
PostgreSQL
Scalar DL
C*
DL
C*
DL
C*
DL
…
C*
DL
C*
DL
Clients Clients
AD AD AD AD
29. Throughput on PostgreSQL
YCSB-F TPC-C (NP)
Scalar DL scaled as the number of client threads increased, whereas PeerReviewTx
didn’t scale as much. The benefit of Scalar DL comes from its concurrency control.
30. Throughput on Cassandra (3 nodes per AD, RF=3)
YCSB-F TPC-C (NP)
The results were similar results to the one on PostgreSQL.
The database-agnostic property was also verified.
32. Summary
● Scalar DL is scalable and practical BFD middleware for transactional
database systems.
● Key contribution: Byzantine fault detection protocol that executes non-
conflicting transactions in parallel while guaranteeing correctness.
● Achieve up to 10 times speedup compared to the state-of-the-art BFD
approach and near-linear (91%) node scalability.
● Scalar DL is a real product, not a research prototype.
○ See https://github.com/scalar-labs/scalardl