For today’s mission-critical web, mobile, and IoT applications high availability is not a “nice to have” feature, it is an essential requirement. Downtime and degraded performance are unacceptable, resulting in unhappy customers and lost revenue.
Join this demo-filled session to learn how to deliver continuously available mission-critical applications across data centers using Couchbase Server and the brand-new multi-cluster-aware SDK extension. We will cover a wide array of high availability and disaster recovery features available (especially those introduced in Couchbase Server 5.0) and how to leverage them to support highly reliable, responsive applications that result in excellent customer experiences.
Failure occurs, you have a couple options. 1. bring in new node 2. Add node back in – delta recover, beautiful thing. Lots of reasons. Node goes down for a short period of time Routine maintenance is scheduled Network connectivity is briefly disrupted
Many uses for XDCR One of the fundamental differentiator is you can include/exclude certain buckets. For example session data. Minimize traffic on network In 4.0 we have filtering, certain data types in same bucket can be replicated. Very cool UK data restrictions
Prior to CB4.6 we had only Revision-based Conflict Resolution which is basically ‘most update wins’ i.e. we look the document with highest revision numbers and pick that as the winner. Although this is simple and effective mechanism to resolve conflict but this has many limitations. With 4.6 we introduced Timestamp-based Conflict Resolution which is basically ‘most recent update wins’ i.e. we look at the timestamp of the document to compare and pick winner based on most recent timestamp. This well known as Last Write Wins. Timestamp-based Conflict Resolution is based Hybrid Logical Clock (HLC). For timestamp-based conflict resolution to work effectively both source and destination cluster needs to clock synchronized using either NTP or other time-synchronization methods.
For the HLC logic counter to overflow into the physical time, it will require 640M mutations/s (64K / 0.1ms) HLC is combination of physical clock and logical counter First 48 bits store the physical timestamp and next 16 bits store the logical counter And it is stored in CAS and as we know CAS is part of the document metadata
Here is an example of cross cluster failover and failback Two clusters A and cluster B have bi-directional replication between them When one of the nodes or entire cluster becomes unavailable application can automatically switch-over the traffic to other cluster for high-availability. When the failed cluster comes back before application traffic is redirected data needs to synchronized and using timestamp based conflict resolution it will guarantee the consistency
In 3.0 we introduced Incremental and Differential backups. Prior to 3.0 is at bucket, cluster or node. As you can imagine for large deployments managing disks space could get costly. Diff is just deltas from previous day Incremental includes previous day
Maximizing high availability for your cluster – Connect Silicon Valley 2017