High Availability - How to get 99.99% service availabilty - Designing clusters (DOs & DON'Ts)

3,309 views

Published on

Presentation at BarcampSaigon 2013 - RMIT 7th July
Presenter: Lukas Rypl

Published in: Travel, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,309
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
68
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

High Availability - How to get 99.99% service availabilty - Designing clusters (DOs & DON'Ts)

  1. 1. High Availability Lukas Rypl Twitter: @LukasRypl 7th July 2013
  2. 2. Agenda Intro - what and why (3 mins) Describing Requirements (10 mins) Solutions (7 mins) Q&A Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 2 / 17
  3. 3. Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 3 / 17
  4. 4. Customers Talking about Requirements fault tolerance fail-over solution high availability disaster recovery geographic redundancy cluster Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 4 / 17
  5. 5. What You Need to Know Why they need it? Budget Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 5 / 17
  6. 6. Use Only One, Some or All Active-Active (Master-Master) Active-Standby (Master-Slave) Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 6 / 17
  7. 7. Use Only One, Some or All Active-Active (Master-Master) Active-Standby (Master-Slave) Operations: read/write read-only none Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 6 / 17
  8. 8. Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 7 / 17
  9. 9. RPO and RTO Recovery Time Objective Recovery Point Objective Beyond Redundancy: How Geographic Redundancy Can Improve Service Availability and Reliability of Computer-based Systems by Eric Bauer, Randee Adams, Daniel Eustace Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 8 / 17
  10. 10. Availability 24x7 SLA Weekly Monthly Yearly 99% 1h 40m 7h 12m 3days 15hrs 99.9% 10m 43m 12s 8h 45m 99.99% 1m 4m 19s 52m 33s 99.999% 6s 26s 5m 15s Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 9 / 17
  11. 11. What You Need to Know How to handle network partitioning? Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17
  12. 12. What You Need to Know How to handle network partitioning? Manual or automatic failover and recovery? Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17
  13. 13. What You Need to Know How to handle network partitioning? Manual or automatic failover and recovery? Local or geographical redundancy? Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17
  14. 14. What You Need to Know How to handle network partitioning? Manual or automatic failover and recovery? Local or geographical redundancy? Connection parameters (BW, RTT, L2/L3) Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17
  15. 15. What You Need to Know How to handle network partitioning? Manual or automatic failover and recovery? Local or geographical redundancy? Connection parameters (BW, RTT, L2/L3) Other goals - load balancing? Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 10 / 17
  16. 16. Solutions - Layers High Availability and Disaster Recovery: Concepts, Design, Implementation by Klaus Schmidt Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 11 / 17
  17. 17. Solutions - Hardware disks: RAID 1, 10, 5, ... (controller with BBWC) power supply network interface: teaming/bonding out-of-band management (HP iLO, Dell DRAC, IBM RSA) Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 12 / 17
  18. 18. Solutions - Data snapshots filesystem: drbd NAS, SAN Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 13 / 17
  19. 19. CAP (Brewster’s) theorem http://blog.rizzif.com/2011/08/31/intro-to-nosql/ Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 14 / 17
  20. 20. Solutions - Databases MySQL - Master-Master replication PostgreSQL - Master-Slave, 3rd party for Master-Master Oracle RAC Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 15 / 17
  21. 21. Solutions - CRM corosync + pacemaker resources (application, IP addresss, ...) rules (where, when, what requires) master-slave Partitioning not allowed STONITH required www.clusterlabs.org Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 16 / 17
  22. 22. Q & A Lukas Rypl (BarCamp Saigon) High Availability 7th July 2013 17 / 17

×