Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Ensuring HA and DR in Couchbase 5.0 – Couchbase Connect New York 2017

242 views

Published on

For today’s mission-critical web, mobile, and IoT applications high availability is not a “nice to have” feature, it is an essential requirement. Downtime, data loss, and degraded performance are unacceptable, resulting in unhappy customers and lost revenue.
Join this demo-filled session to learn how to deliver continuously available mission-critical applications across data centers. In this session we will cover the wide array of high availability and disaster recovery features available in Couchbase Server (especially those introduced in Couchbase Server 5.0) and how to leverage them to support highly reliable, responsive applications that result in excellent customer experiences.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Ensuring HA and DR in Couchbase 5.0 – Couchbase Connect New York 2017

  1. 1. ©2017 Couchbase Inc. High Availability / Disaster Recovery Mel Boulos / Austin Gonyou Solutions Engineer Couchbase 1
  2. 2. ©2017 Couchbase Inc. Next 40 minutes … • Part I - High Availability • Single node architecture • Local data redundancy • Rebalance and failover • Node recovery • Part II - Disaster Recovery • Business continuity for “mission-critical” applications • Geo redundancy • Backup-Restore for worst case scenario • Part III – What’s New in 5.0 • Part I V – HA / DR Technologies and Methodologies • Demo
  3. 3. ©2017 Couchbase Inc.©2017 Couchbase Inc. 3 Part I - High Availability
  4. 4. ©2017 Couchbase Inc. Couchbase Server – Single Node Architecture ▪ Single node type is the foundation for high availability architecture ▪ No Single Point of Failure (SPOF) ▪ Easy scalability STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 2 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service
  5. 5. ©2017 Couchbase Inc. Intra-Cluster Replication – Data Redundancy ▪ RAM to RAM replication ▪ Max of 4 copies of data in a Cluster ▪ Bandwidth optimized through de-duplicate, or ‘de-dup’ the item Intra-cluster replication is the process of replicating data on multiple servers within a cluster in order to provide data redundancy.
  6. 6. ©2017 Couchbase Inc. Failover Operation - Fault-tolerance • Failover automatically switches-over to the replicas for a given database • Gracefully under node maintenance • Immediately under auto-failover • Can be triggered manually through the Admin-UI/REST/CLI • Automatic failover in case of unplanned outages – system failures • Can be configured through Admin-UI/REST/CLI • Constraints in place to avoid “split-brain” and false positives • 30 second delay, multiple heartbeat “pings” • Clusters >=3 nodes • Only one node down at a time
  7. 7. ©2017 Couchbase Inc. 7 Rack-Zone Awareness ▪ Informs Couchbase as to the physical distribution of nodes in racks or availability groups. ▪ Ensures that active and replica vBuckets are distributed across groups ▪ Servers 1, 2, 3 on Rack 1 ▪ Servers 4, 5, 6 on Rack 2 ▪ Servers 7, 8, 9 on Rack 3 ▪ Cluster has 2 replicas (3 copies of data) ▪ This is a balanced configuration
  8. 8. ©2017 Couchbase Inc. 8 Rack-Zone Awareness ▪ If an entire rack fails, data is still available ▪ If an entire cloud availability zone fails, data is still available
  9. 9. ©2017 Couchbase Inc. Automatic Failover – “In action” SERVER 4 SERVER 5 Replica Active Replica ActiveActive SERVER 1 Shard 5 Shard 2 Shard 9Shard Shard Shard Replica Shard 4 Shard 1 Shard 8Shard Shard Shard Active SERVER 2 Shard 4 Shard 7 Shard 8 Shard Shard Shard Replica Shard 6 Shard 3 Shard 2 Shard Shard Shard Active SERVER 3 Shard 1 Shard 3 Shard 6Shard Shard Shard Replica Shard 7 Shard 9 Shard 5Shard Shard Shard ▪ App servers accessing Shards ▪ Requests to Server 3 fail ▪ Cluster detects server failed ▪ Promotes replicas of Shards to active ▪ Updates cluster map ▪ Requests for docs now go to appropriate server ▪ Typically rebalance would follow Shard 1 Shard 3 Shard COUCHBASE Client Library CLUSTER MAP COUCHBASE Client Library CLUSTER MAP
  10. 10. ©2017 Couchbase Inc. Node Recovery – Bring Cluster back to Capacity • Failed-Over node can re-added back to cluster • Full recovery – Add back as a fresh node • Delta Node recovery – Add back failed node incrementally into the cluster without having to rebuild the full node.
  11. 11. ©2017 Couchbase Inc. 11 Modern Architecture – Multi-Dimensional Scaling MDS is the architecture that enables independent scaling of data, query, and indexing workloads while being managed as one cluster. Couchbase 4.X
  12. 12. ©2017 Couchbase Inc. 12 Modern Architecture – Multi-Dimensional ScalingCouchbase 4.X
  13. 13. ©2017 Couchbase Inc. 13 Modern Architecture – Multi-Dimensional Scaling Couchbase 4.X
  14. 14. ©2017 Couchbase Inc.©2017 Couchbase Inc. 14 Part I I – Disaster
  15. 15. ©2017 Couchbase Inc. 15 Cross Datacenter Replication (XDCR) Unidirectional or Bidirectional Replication Unidirectional ▪ Hot spare / Disaster Recovery ▪ Development/Testing copies ▪ Connectors (Solr, Elasticsearch) ▪ Integrate to custom consumer Bidirectional ▪ Multiple Active Masters ▪ Disaster Recovery ▪ Datacenter Locality
  16. 16. ©2017 Couchbase Inc. Cross Datacenter Replication (XDCR) using DCP • Replicates continuously data FROM source cluster to remote clusters may be spread across geo’s • Supports unidirectional and bidirectional operation • Application can read and write from both clusters (active – active replication) • Automatically handles node addition and removal • Simplified Administration via Admin UI, REST, and CLI • Pause and resume XDCR replication • (NEW in 4.0) Filtering of data on replication stream
  17. 17. ©2017 Couchbase Inc. XDCR – Memory based using DCP APPLICATION SERVER MANAGED CACHE DISK DISK DOC 1 DOC 1 Intra-Cluster Replication INDEXER Cross Datacenter Replication DOC 1DOC 1
  18. 18. ©2017 Couchbase Inc. Key Features • Zero downtime backup and restore • Single utility to manage multiple clusters • Restore from any point, to any bucket or topology • Fully differential – merge backups to maintain desired restore points • Built-in concurrency control • Resume interrupted backups • Compaction reduces storage requirements
  19. 19. ©2017 Couchbase Inc. • Minimize time and resources during backups Efficient Recovery with Incremental Backup & Restore • Back up only the data updated since the last backup • Differential Backups • Cumulative Backups
  20. 20. ©2017 Couchbase Inc.©2017 Couchbase Inc. 20 Part I I I – What’s New in 5.0
  21. 21. ©2017 Couchbase Inc. XDCR Conflict Resolution Modes ● Revision-based Conflict Resolution [Default] Current XDCR conflict resolution uses the revision ID (part of document metadata) as the first field to resolve conflicts between two writes across clusters. Revision IDs keep track of the number of mutations to a key, thus the current XDCR conflict resolution can be best characterized as “the most updates wins”. ● Timestamp-based Conflict Resolution [New] Timestamp-based conflict resolution uses the hybrid logical clock (part of document metadata) as the first field to resolve conflicts between two writes across clusters. Timestamp has both physical time (NTP) and a logical counter, thus the new XDCR conflict resolution is also known as Last Write Wins (LWW) and is best characterized as “the most recent update wins”.
  22. 22. ©2017 Couchbase Inc. What is Hybrid Logical Clock? • Hybrid Logical Clock is combination of physical time and logical counter. • Hybrid Logical Clock is represented as 64 bit integer • First 48 bit – physical time • Last 16 bit - logical counter • Hybrid Logical Clock is stored in CAS
  23. 23. ©2017 Couchbase Inc. High Availability - Cross-Cluster FailOver & FailBack XDCR is used by many customers as an availability solution. Failing over to a secondary cluster following cluster wide or partial failures is critical for customers to be able to provide availability on their database tier.
  24. 24. ©2017 Couchbase Inc. Robust Failure Detector - Pluggable design - More monitors can be added in future
  25. 25. ©2017 Couchbase Inc. Fast Failover - Motivation ● Couchbase Server’s key high availability features is automatic failover. ● Automatic failover happens within the cluster after a minimum timeout of 30 seconds. Currently customers with reliable network infrastructure and sufficiently/appropriately-sized clusters cannot reduce the “auto- failover” timeout below 30 seconds. ● Also, our current failover mechanism cannot identify the false positives quickly and recover or prevent failover under false positives.
  26. 26. ©2017 Couchbase Inc. Auto-FailOver in ~5-10secs ● Customers with reliable network infrastructure should be able reduce the “auto-failover” timeout from 30 seconds to <5-10 seconds [Amadeus, Apple, EBay, Paypal, Comptel and more..] ● Failover mechanism effectively identifies the false positives and recover or prevent failover under false positives. ● The total downtime experienced by the application is <5-10sec (downtime - time between the first write failing vs to first write succeeding again after failover)
  27. 27. ©2017 Couchbase Inc.©2017 Couchbase Inc. 27 Database HA/DR Methodologies and Technologies
  28. 28. ©2017 Couchbase Inc. 28 Austin Gonyou Sr. Solutions Engineer, Couchbase Server & Mobile Austin.Gonyou@couchase.com Background 25 years in technology space Linux Sys Admin Oracle DBA Oracle Performance Manager Linux Kernel Contributor MIT Kerberos Contributor LIDS Contributor SGI XFS Project contributor System Imager Contributor Former USMC IMAGE GOES HERE
  29. 29. ©2017 Couchbase Inc. 29 HA/DR Methodologies • Data Duplication • Volume Mirroring • Host-based HA tools • Tar, cpio, rsync, etc • Host Replication • Storage Mirroring • Backup replication • Clustering software • Cloud deployments • Replication Tools • Dell/EMC TimeFinder,SRDF, RecoverPoint (Unity) • VMware FT/HA/DRS • DRBD • SQL Replication • Data Guard • Golden Gate • HA/Clustering tools • Veritas Cluster • Oracle RAC • Oracle Grid • Never Fail Cluster • PaceMaker • HA Proxy • HeartbeatAnd MANY, MANY MORE!
  30. 30. ©2017 Couchbase Inc. 30 Traditional Components of HA/DR Infrastructure • Servers or Virtual machines • On-premises or ”in the cloud” • Networking • Virtual or Physical • Load Balancers • Ethernet or InfiniBand • Reverse Proxies and Routers • Storage • Virtual and Physical • Software Layers • OS Volumes • SAN • NAS • Shared • Software • Clustering • Replication
  31. 31. ©2017 Couchbase Inc. HA/DR Common Approach DB Server 1 DB Server 2 Storage VIP Application Server Application Server……. Storage App Net Data DB Server 2 VIP Application Server
  32. 32. ©2017 Couchbase Inc. HA/DR Common Approach 32
  33. 33. ©2017 Couchbase Inc. HA/DR New Approach 33 DB Server 1 DB Server 2 Storage VIP Application Server 1 Application Server N…….
  34. 34. ©2017 Couchbase Inc. DC 1DC 1 Database Cluster Application Cluster Multi-Site Active-Active or HA/DR DB Server N Application Server 1 Application Server N ……. DC1 VIP DB Server 1 Database Cluster Application Cluster DB Server N Application Server 1 Application Server N ……. DC2 VIP DB Server 1
  35. 35. ©2017 Couchbase Inc. Couchbase HA and DR • Couchbase Replication • Built-in • Local (within cluster) • Remote, with Filtering (outside cluster) • Conflict Resolution • Flexible Topologies • Unidirectional • Bidirectional • Multi-Master • Multi-Cluster Aware SDKs • Inconsistent Clusters and Fault Tolerant • Restartable 35
  36. 36. ©2017 Couchbase Inc.©2017 Couchbase Inc. 36 XDCR Demonstration Transition Slide Subtitle Goes Here
  37. 37. ©2017 Couchbase Inc.©2017 Couchbase Inc. Thank You! 37

×