Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

5 Levels of High Availability: From Multi-instance to Hybrid Cloud

176 views

Published on

5 Levels of High Availability: From Multi-instance to Hybrid Cloud

Published in: Software
  • Be the first to comment

  • Be the first to like this

5 Levels of High Availability: From Multi-instance to Hybrid Cloud

  1. 1. 5 Levels of High Availability From Multi-instance to Hybrid Cloud Rafał Leszko @RafalLeszko rafalleszko.com Hazelcast
  2. 2. About me ● Cloud Software Engineer at Hazelcast ● Worked at Google and CERN ● Author of the book "Continuous Delivery with Docker and Jenkins" ● Trainer and conference speaker ● Live in Kraków, Poland
  3. 3. About Hazelcast ● Distributed Company ● Open Source Software ● 140+ Employees ● Products: ○ Hazelcast IMDG ○ Hazelcast Jet ○ Hazelcast Cloud @Hazelcast
  4. 4. ● Introduction ● High Availability Levels ○ Level 0: Single Instance ○ Level 1: Multi Instance ○ Level 2: Multi Zone ○ Level 3: Multi Region ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  5. 5. Introduction
  6. 6. application service
  7. 7. micro-service application service
  8. 8. micro? application service
  9. 9. application service
  10. 10. application service
  11. 11. application service
  12. 12. Stateless
  13. 13. application service
  14. 14. application servicedata store
  15. 15. application servicedata store queue
  16. 16. application servicedata store queue other service
  17. 17. Data is the problem!
  18. 18. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ○ Level 1: Multi Instance ○ Level 2: Multi Zone ○ Level 3: Multi Region ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  19. 19. Level 0: Single Instance
  20. 20. application service data store Level 0: Single Instance machine request
  21. 21. Level 0: Single Instance YOLO!
  22. 22. LATENCY EXPERIMENT
  23. 23. What does "Level 0: Single Instance" mean to You? No high availability! No scalability! Super low latency: ● in-process memory ● no network ● local file system Data consistency
  24. 24. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ○ Level 2: Multi Zone ○ Level 3: Multi Region ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  25. 25. Level 1: Multi Instance
  26. 26. If one machine is down, the system is still available
  27. 27. application data Level 1: Multi Instance request data load balancer application application application application machine 1 machine 2 machine 3 machine 4
  28. 28. application data Level 1: Multi Instance data application application application application machine 1 machine 2 machine 3 machine 4
  29. 29. data Level 1: Multi Instance data machine 3 machine 4
  30. 30. data store Level 1: Multi Instance machine 1 data store machine 2 Assumptions: ● Local network ● Fast ● Reliable For example: ● EC2 Instances in the same availability zone ● GCP VM instances in the same zone ● Your on-premises server machines connected with LAN
  31. 31. data store Level 1: Multi Instance machine 1 data store machine 2
  32. 32. Data replication
  33. 33. data store Option 1: Active-Passive (Master-Slave) Replication machine 1 data store machine 2 application service application service application service active passive SQL
  34. 34. data store Option 1: Active-Passive (Master-Slave) Replication machine 1 data store machine 2 application service application service application service active passive
  35. 35. data store Option 1: Active-Passive (Master-Slave) Replication machine 1 data store machine 2 application service application service application service down active
  36. 36. data store Option 1: Active-Passive (Master-Slave) Replication machine 1 data store machine 2 application service application service application service active passive SQL
  37. 37. data store [A-G] Option 2: Clustering machine 1 data store [H-S] machine 2 data store [T-Z] machine 3 NoSQL
  38. 38. data store [A-G] Option 2: Clustering machine 1 data store [H-S] machine 2 application service application service application service data store [T-Z] machine 3
  39. 39. data store [A-G] Option 2: Clustering machine 1 data store [H-S] machine 2 application service application service application service data store [T-Z] machine 3
  40. 40. 1 Hazelcast Member 1 data partitions 4 7 data partition backups 3 5 9 3 Hazelcast Member 3 data partitions 6 9 data partition backups 2 4 8 2 Hazelcast Member 2 data partitions 5 8 data partition backups 1 6 7
  41. 41. data store [A-G] Option 2: Clustering machine 1 data store [H-S] machine 2 data store [T-Z] machine 3 NoSQL
  42. 42. Synchronous vs Asynchronous
  43. 43. Synchronous (Consistency) or Asynchronous (Latency)? data store machine 1 machine 2 machine 3 data store machine 1 data store machine 2 active passive data store data store
  44. 44. LATENCY EXPERIMENT
  45. 45. Synchronous (Consistency) or Asynchronous (Latency)? data store machine 1 machine 2 machine 3 data store machine 1 data store machine 2 active passive data store data store synchronous
  46. 46. Synchronous (Consistency) or Asynchronous (Latency)? data store machine 1 machine 2 machine 3 data store machine 1 data store machine 2 active passive data store data store synchronous?
  47. 47. What does "Level 1: Multi Instance" mean to You? Data consistency! Most tools supported Cloud-specific toolkit (e.g. AWS SQS) Simple setup (even on-premises) High latency if accessed multi regions
  48. 48. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ✔ ○ Level 2: Multi Zone ○ Level 3: Multi Region ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  49. 49. Level 2: Multi Zone
  50. 50. If one availability zone is down, the system is still available
  51. 51. application data Level 2: Multi Zone request data load balancer application application application application zone 1 zone 2
  52. 52. Is multi-zone deployment any different?
  53. 53. No
  54. 54. No but… Yes
  55. 55. LATENCY EXPERIMENT
  56. 56. No but… Yes
  57. 57. Level 2: Multi Zone zone 1 zone 2 data data data data
  58. 58. Level 2: Multi Zone zone 1 zone 2 data data data data Assumptions: ● Machines in at least 2 AZ ● Fast and reliable network For example: ● EC2 Instances in 2 AWS Availability Zones ● Azure VM instances in 2 Availability Sets
  59. 59. 1 Hazelcast Member 1 data partitions 4 7 data partition backups 3 5 9 3 Hazelcast Member 3 data partitions 6 9 data partition backups 2 4 8 2 Hazelcast Member 2 data partitions 5 8 data partition backups 1 6 7
  60. 60. Level 2: Multi Zone machine 1 machine 2 Hazelcast Member 2 zone 1 zone 2 Hazelcast Member 1 machine 3 machine 4 Hazelcast Member 4Hazelcast Member 3
  61. 61. Hazelcast configuration: hazelcast: partition-group: enabled: true group-type: ZONE_AWARE Hazelcast Zone Aware Feature
  62. 62. Hazelcast Zone Aware Feature
  63. 63. Level 2: Multi Zone machine 1 machine 2 Hazelcast Member 2 zone 1 zone 2 Hazelcast Member 1 machine 3 machine 4 Hazelcast Member 4Hazelcast Member 3
  64. 64. No but… Yes
  65. 65. Make sure your data store is ZONE AWARE
  66. 66. What does "Level 2: Multi Zone" mean to You? Currently top 1 choice! Data consistency! Cloud-specific toolkit (e.g. AWS SQS) High latency if accessed multi regions Not all tools are "zone aware"
  67. 67. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ✔ ○ Level 2: Multi Zone ✔ ○ Level 3: Multi Region ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  68. 68. Level 3: Multi Region
  69. 69. If one region is down, the system is still available
  70. 70. application data Level 3: Multi Region geo load balancer region 1 load balancer load balancer region 2 application application application application application application application data data data data data geo replication
  71. 71. data Level 3: Multi Region region 1 region 2 data data data data data geo replication Assumptions: ● Machines in at least 2 geographical regions ● Network may be slow and unreliable For example: ● EC2 Instances in regions: eu-central- 1 and us-west-2
  72. 72. 10 000 km
  73. 73. Speed of light: 300 000 km/s Distance: 10 000 km RTT (Round Trip Time) = 60 ms Level 3: Multi Region
  74. 74. Geo-replication
  75. 75. data data data geo replication Level 3: Multi Region (Geo-replication) data data data
  76. 76. Geo-replication ● It's asynchronous ● Your data store must support it ● You must be prepared for data loss ● Two modes: ○ Active-Passive ○ Active-Active
  77. 77. data data data geo replication Active-Passive Geo-replication data data data active passive
  78. 78. data data data geo replication Active-Passive Geo-replication data data data active passive ● data loss possible ● (eventual) consistency
  79. 79. data data data geo replication Active-Active Geo-replication data data data active active
  80. 80. data data data geo replication Active-Active Geo-replication data data data active active ● data loss possible ● eventual consistency ● conflict resolution
  81. 81. Hazelcast WAN Replication hazelcast: wan-replication: batch-publisher: target-endpoints: 35.184.122.109
  82. 82. Do I really need to lose consistency?
  83. 83. LATENCY EXPERIMENT
  84. 84. Strong Consistency in Multi Region ● NewSQL (Spanner, CockroachDB) ● Multi-region distributed transactions ● Consensus algorithms (Paxos, Raft) ● Always a trade-off: consistency vs latency
  85. 85. What does "Level 3: Multi Region" mean to You? Super high available! Low latency if accessed from multi regions Sometimes possible to use Cloud-specific toolkit (e.g. Google Spanner - yes, AWS Elasticache - no) Geo-replication (asynchronous)! Eventual consistency (conflict resolution)
  86. 86. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ✔ ○ Level 2: Multi Zone ✔ ○ Level 3: Multi Region ✔ ○ Level 4: Multi Cloud ○ Level 5: Hybrid Cloud ● Summary Agenda
  87. 87. Level 4: Multi Cloud
  88. 88. If one cloud provider is down, the system is still available
  89. 89. app data Level 4: Multi Cloud global load balancer load balancer app app data data replication app data load balancer app app data data
  90. 90. data Level 4: Multi Cloud data data replication data data data cloud provider 1 cloud provider 2 Assumptions: ● Machines in at least 2 cloud providers ● Network may be slow and unreliable ● Machines may be in different geo regions For example: ● EC2 Instances in eu-central-1 and GCP VM Instances in us-west1-a
  91. 91. What's different from multi-region?
  92. 92. Level 4: Multi Cloud ● No Cloud-specific tools ● No VPC Peering across Cloud providers ○ Latency ○ Security ● Cost
  93. 93. Is High Availability the only reason for Multi-Cloud?
  94. 94. Reasons for Multi-Cloud ● High Availability / Disaster Recovery ● Avoiding vendor lock-in ● Cloud cost optimization ● Risk Mitigation ● Low latency ● Data Protection / Regulations / Compliance ● Best-Fit Technology (Cloud-specific portfolios)
  95. 95. What does "Level 4: Multi Cloud" mean to You? No vendor lock-in! Cloud cost negotiations Low latency if accessed from multi-cloud Complex setup! No Cloud toolkit (e.g. AWS SQS)
  96. 96. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ✔ ○ Level 2: Multi Zone ✔ ○ Level 3: Multi Region ✔ ○ Level 4: Multi Cloud ✔ ○ Level 5: Hybrid Cloud ● Summary Agenda
  97. 97. Level 5: Hybrid Cloud
  98. 98. If all cloud providers are down, the system is still available
  99. 99. Is it possible that all cloud providers are down?
  100. 100. No!
  101. 101. Reasons for Hybrid Cloud ● Data requirements / regulations ● Data security ● Moving to Cloud ● Cost reduction ● All mentioned already in Multi-Cloud
  102. 102. Level 5: Hybrid Cloud global load balancer On-Premises
  103. 103. What does "Level 5: Hybrid Cloud" mean to You? No Cloud lock-in! Low latency if accessed from custom networks Super complex setup! Usually extra layer needed (e.g. Kubernetes, OpenShift) Costs a fortune!
  104. 104. ● Introduction ✔ ● High Availability Levels ○ Level 0: Single Instance ✔ ○ Level 1: Multi Instance ✔ ○ Level 2: Multi Zone ✔ ○ Level 3: Multi Region ✔ ○ Level 4: Multi Cloud ✔ ○ Level 5: Hybrid Cloud ✔ ● Summary Agenda
  105. 105. Summary
  106. 106. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  107. 107. distributed system (data replication) Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  108. 108. distributed system (data replication) Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud zone aware
  109. 109. distributed system (data replication) Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud long distance (geo- replication) zone aware
  110. 110. distributed system (data replication) Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud no cloud tools long distance (geo- replication) zone aware
  111. 111. distributed system (data replication) Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud no cloud tools own infrastructure long distance (geo- replication) zone aware
  112. 112. Which High Availability Level is for me?
  113. 113. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  114. 114. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  115. 115. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  116. 116. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  117. 117. Single Instance Multi Instance Multi Zone Multi Region Multi Cloud Hybrid Cloud
  118. 118. Thank You! Rafał Leszko @RafalLeszko rafalleszko.com

×