Active/Active Payments Processing
Ted Mao and Jiang-Ming Yang
• Resilient to datacenter-level failure
• Resilient to Internet routing
• Transparent to the merchant
• No human intervention
• Every second of uptime matters to
our merchants. Goal is 5 9s.
• Much easier and safer to perform
Inconsistent state between
Datacenters can’t tell if a transaction
has already been processed
Payment networks can’t reliably
guarantee idempotence on retries.
Real-time latency requirements
We can’t just wait until our
datacenters get in sync.
When Merchant try to sell items/products to customers, customers will
have the option to pay with multiple tenders.
1. 1. CreateBill
3.3. CompleteBill / CancelBIll
1. 1. Each time we receive a tender request, we need to process this
tender immediately. Thus different tenders for the same bill may be
processed at different data centers.
2.2. When receiving the CompleteBill request, we may need to wait for
the tender information from remote data center.
Tender state machine
Bill state machine
1. 1. A formal proof
2.2. Simulate all the possible operational combinations and verify the
Asynchronous, eventually consistent
systems are harder to reason about.
Active/active systems are harder to
design, implement, and test.
If the original data center is down and
never comes back, we may not be able
the perform the capture due to the loss
of original auth.
Not all downstream effects are
We want a storage solution with the
1. Horizontally scalable
2. Tolerant to DC failure
CockroachDB: a Scalable, Geo-
Replicated, Transactional Datastore