Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Next Level High Availability with Couchbase Server 5.5 – Connect Silicon Valley 2018

63 views

Published on

Speaker: Dave Finlay, VP of Software Engineering, Couchbase

A critical component of delivering an amazing customer experience is ensuring that the experience is always available. That’s why the Couchbase Data Platform delivers key high availability features such as: zero downtime administration and maintenance, data redundancy, and automatic failover. In this session, we’ll introduce the exciting additions to the high availability capabilities of Couchbase Server 5.5 and talk about how they can be used to take your deployment to the next level.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Next Level High Availability with Couchbase Server 5.5 – Connect Silicon Valley 2018

  1. 1. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. HIGHER AVAILABILITY Take Your Availability to the Next Level with Couchbase 5.5 Sept 19, 2018 Dave Finlay | VP Engineering, Couchbase Michael Nitschinger | Principal Engineer, Couchbase
  2. 2. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 2 What is High Availability?
  3. 3. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 3 What is High Availability? Wikipedia: High availability is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
  4. 4. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 4 What is High Availability? Wikipedia: High availability is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Is Up
  5. 5. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 5 What is High Availability? Wikipedia: High availability is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Is Up For a Long Time
  6. 6. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 6 How Hard is That?
  7. 7. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 7 What Can Go Wrong?
  8. 8. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 8 What Can Go Wrong? Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3
  9. 9. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 9 What Can Go Wrong? Planned Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3
  10. 10. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 10 What Can Go Wrong? Planned Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3 Couchbase XDCR youtube.com/watch?v=eejpTK- bay8
  11. 11. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 11 What Can Go Wrong? Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3 Data-Center Level
  12. 12. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 12 What Can Go Wrong? Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3 Data-Center Level Couchbase XDCR + Multi-Cluster Awareness
  13. 13. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 13 What Can Go Wrong? Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3 Rack-Level Failures
  14. 14. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 14 What Can Go Wrong? Courtesy of Jeff Dean of Google from LADIS 2009 http://www.cs.cornell.edu/ projects/ladis2009/progra m.htm#keynote3 Lots and Lots and Lots of Other Kinds of Failures
  15. 15. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 15 Time Matters: Measuring Availability MTTF Mean Time To Failure
  16. 16. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 16 Time Matters: Measuring Availability MTTF Mean Time To Failure MTTR Mean Time To Repair
  17. 17. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 17 Time Matters: Measuring Availability MTTF Mean Time To Failure MTTR Mean Time To Repair Availability = MTTF MTTF + MTTR
  18. 18. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 18 Time Matters: Measuring Availability MTTF Mean Time To Failure MTTR Mean Time To Repair Availability = MTTF MTTF + MTTR Want MTTF to be BIG (long time between failures)
  19. 19. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 19 Time Matters: Measuring Availability MTTF Mean Time To Failure MTTR Mean Time To Repair Availability = MTTF MTTF + MTTR Want MTTF to be BIG (long time between failures) Want MTTF to be small (repair quickly)
  20. 20. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 20 Time Matters: Measuring Availability Expression Percentage Uptime / % Downtime / Year Three 9s 99.9 8 hrs 46 mins Four 9s 99.99 52 mins 34 secs Four 9s and a 5 99.995 26 mins 17 secs Five 9s 99.999 5 mins 15 secs Six 9s 99.9999 32 secs
  21. 21. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 21 That’s Ridiculous!
  22. 22. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 22 That’s Ridiculous! How the hell can my system possibly survive in such an environment?
  23. 23. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 23 Don’t Panic!
  24. 24. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 24 Don’t Panic! We’ve got your back 
  25. 25. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 25 Achieving HA Wikipedia: 1. Elimination of single points of failure 2. Reliable crossover (i.e. reliable configuration / topology change) 3. Fast failure detection
  26. 26. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 26 Lets Look at a Couchbase Cluster Group 1 Group 2 Group 3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Server groups are used to reflect the rack (or zone) topology
  27. 27. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 27 Lets Look at a Couchbase Cluster: Data 1 1 1 Group 1 Group 2 Group 3 1 1 Active partition (vbucket) Replica partition (vbucket) Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Replicas are placed on different nodes in different groups to the active.
  28. 28. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 28 Lets Look at a Couchbase Cluster: Data 1 21 12 2 333 4 44 55 5 666 Group 1 Group 2 Group 3 1 1 Active partition (vbucket) Replica partition (vbucket) Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 In this example, each server serves 1 active and 2 replica partitions.
  29. 29. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 29 Lets Look at a Couchbase Cluster: Data 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION SDK SET K,V Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Writes are … immediately replicated
  30. 30. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 30 Lets Look at a Couchbase Cluster: Data 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION SDK Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 SET K,V Writes are … immediately replicated
  31. 31. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 31 Lets Look at a Couchbase Cluster: Auto-Failover 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION SDK SET K,V Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Auto-failover is enabled by default in 5.5
  32. 32. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 32 Lets Look at a Couchbase Cluster: Auto-Failover 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION SDK SET K,V Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Multi-node auto-failover is new in 5.5
  33. 33. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 33 Lets Look at a Couchbase Cluster: Auto-Failover 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION SDK SET K,V Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Server group auto-failover is new in 5.5
  34. 34. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 34 Lets Look at a Couchbase Cluster: Orchestrator 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Opromote Orchestrator makes the decision to auto failover.
  35. 35. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 35 Lets Look at a Couchbase Cluster: Orchestrator 1 21 12 2 333 4 44 55 5 666 REPLICATION Group 1 Group 2 Group 3 REPLICATION Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 O promote Orchestrator makes the decision to auto failover. However, cannot be a SPOF. It must itself be highly available. Orchestrator failover is improved in 5.5
  36. 36. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 36 Under The Hood: Robust Failure Detector Data Data Data REPLICATION REPLICATION Cluster Manager Cluster Manager Cluster Manager Orchestrator
  37. 37. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 37 Under The Hood : Robust Failure Detector Data Data Data REPLICATION REPLICATION Cluster Manager Data Monitor Cluster Manager Cluster Manager Data Monitor Orchestrator Data Monitor
  38. 38. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 38 Under The Hood : Robust Failure Detector Data Data Data REPLICATION REPLICATION Cluster Manager Data Monitor Cluster Manager Cluster Manager Cluster Manager Monitor Data Monitor Cluster Manager Monitor Orchestrator Cluster Manager Monitor Data Monitor HEART BEATS HEART BEATS
  39. 39. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 39 Under The Hood : Robust Failure Detector Data Data Data REPLICATION REPLICATION Cluster Manager Data Monitor Node Monitor Cluster Manager Cluster Manager Cluster Manager Monitor Data Monitor Node Monitor Cluster Manager Monitor Orchestrator Node Monitor Cluster Manager Monitor Data Monitor HEART BEATS HEART BEATS
  40. 40. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 40 Under The Hood : Robust Failure Detector Data Data Data REPLICATION REPLICATION Cluster Manager Data Monitor Node Monitor Cluster Manager Cluster Manager Cluster Manager Monitor Data Monitor Node Monitor Cluster Manager Monitor Orchestrator Node Monitor Cluster Manager Monitor Data Monitor HEART BEATS HEART BEATS Data Service Monitor tracks disk health in 5.5
  41. 41. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 41 Lets Look at a Couchbase Cluster: Query & Index Group 1 Group 2 Group 3 REPLICATION REPLICATION Data Data Data Data Data Data IndexQuery Query QueryIndex Index Node 1 Node 2 Node 3 Node 4 Node 5 Node 6
  42. 42. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 42 Lets Look at a Couchbase Cluster: Query & Index Group 1 Group 2 Group 3 REPLICATION REPLICATION Data Data Data Data Data Data IndexQuery Query QueryIndex IndexI1 I1 I1 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Index replicas support highly available queries.
  43. 43. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 43 Lets Look at a Couchbase Cluster: Query & Index Group 1 Group 2 Group 3 REPLICATION REPLICATION Data Data Data Data Data Data IndexQuery Query QueryIndex IndexI1 I1 I1 SDK Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 SELECT * FROM MESSAGES WHERE TO = “dave” Index replicas support highly available queries.
  44. 44. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 44 Lets Look at a Couchbase Cluster: Query & Index Group 1 Group 2 Group 3 REPLICATION REPLICATION Data Data Data Data Data Data IndexQuery Query QueryIndex IndexI1 I1 I1 SDK SELECT * FROM MESSAGES WHERE TO = “dave” Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Index replicas support highly available queries. If an index is unavailable, Query will automatically scan a replica.
  45. 45. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 45 Lets Look at a Couchbase Cluster: Query & Index Group 1 Group 2 Group 3 REPLICATION REPLICATION Data Data Data Data Data Data IndexQuery Query QueryIndex IndexI1 I1 I1 SDK SELECT * FROM MESSAGES WHERE TO = “dave” Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Index replicas support highly available queries. If Query is unavailable, the SDK will automatically try an alternate Query service.
  46. 46. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. DEMO
  47. 47. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. 56 Summary • Building highly-available system is hard • One way for things to work right; an infinite variety of ways for them to go wrong • Couchbase’s Distributed Data Platform is: • Designed to be resilient and fault-tolerant from the ground-up • Across key-value, query and indexing and more • Made to be deployed in today’s cloud infrastructure and data centers • In 5.5 we’ve taken the High Availability options to a new level
  48. 48. Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2018. All rights reserved. WRITE A COUCHBASE REVIEW: http://bit.ly/TrustCB DOWNLOAD THE MOBILE APP WI-FI: SSID: Couchbase Password: Rackspace EVENT HASHTAG: #CBConnect COUCHBASE LIVE: Chat with us on Facebook Live (near registration area) Thank you! Questions?

×