Disaster Recovery - On-Premise & Cloud

1,515 views

Published on

We will cover different scenarios for Disaster Recovery

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,515
On SlideShare
0
From Embeds
0
Number of Embeds
83
Actions
Shares
0
Downloads
66
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Disaster Recovery - On-Premise & Cloud

  1. 1. CLOUDCONF 2014 Database: backup e disaster recovery in Cloud Walter Dal Mut @walterdalmut – www.corley.it – walterdalmut.com
  2. 2. DISASTER RECOVERY Disaster recovery (DR) is about preparing for and recovering from a disaster.
  3. 3. DISASTER Any event that has a negative impact on your business continuity or finances could be termed a disaster.
  4. 4. WHYWEARETALKINGABOUT DR? • Over 70% of businesses involved in a major fire either do not reopen, or subsequently fail within 3 years of fire. (Source continuitycentral.com) • 80% of businesses affected by a major incident either never re-open or close within 18 months (SourceAxa) • 70 percent of companies go out of business after a major data loss (Source continuitycentral.com) • 80% of businesses suffering a computer disaster, who have no disaster recovery plans, go out of business. (Source “A BridgeToo Far”, IBM BusinessRecovery Service & Cranfield, 1993) • A recent study from Gartner, Inc., found that 90 percent of companies that experience data loss go out of business within two years. • 80 percent of companies without well-conceived data protection and recovery strategies go out of business within 2 years of a major disaster. (Source: US NationalArchives and Records Administration)
  5. 5. RTO – RECOVERYTIME OBJECTIVE This is the duration of time and the service level to which a business process must be restored after a disaster
  6. 6. RTO what it implies? • Have a system that records 1000 transaction at hour • Take a snapshot of a system at 03:00 am (every day) • 10:00 am a disaster event occurs • You spend 1 hour to sort things out for the backup (off-site, preparation, etc.) • Recover operation takes 4 hours in order to get back to operate (at minimum service level) • 5 hours is the: RECOVERYTIME OBJECTIVE
  7. 7. RPO – RECOVERY POINT OBJECTIVE This describes the acceptable amount of data loss measured in time.
  8. 8. RPO –WHAT IT IMPLIES? • Have a system that records 1000 transaction at hour • Take a snaphot of a system at 03:00 am (every day) • 10:00 am a disaster event occurs • In this case we lost around 7000 transactions. • 1000 transactions 03:00 04:00 • 1000 transactions 04:00 05:00 • … • But: we are accepting 24 hours of data loss 24000 transactions (RPO)
  9. 9. DISASTER RECOVERY STRATEGIES Local tape backup Online backup Pilot-Light Warm Stand-by And More… $ $$$ $$$$$$ Seconds Days
  10. 10. ON-PREMISE & CLOUD Use cloud resources in order to provide business continuity
  11. 11. Disaster Recovery & Cloud? • On Demand • We can allocate and release new resources whenever we need • Cost Effective • Pay as you go model.We pay only for resources that we are effectively using • Scalable • We can scale freely and adapt our strategy thanks to autoscaling and other mechanisms • Secure • Control doesn’t mean security
  12. 12. FOCUS ON DATABASES We will focus on MySQL but you can apply to your infrastructure without any problem.
  13. 13. BACKUP & RESTORE Take a snapshot of a system and restore it when you need it
  14. 14. Application
  15. 15. Backup
  16. 16. Restore
  17. 17. RTO & RPO? Things to remember…
  18. 18. RTO What resources can impact on my RTO
  19. 19. RESOURCES ALLOCATION How fast we can set up all resources, eg: instances, network, etc etc.
  20. 20. DB RESTORE How many time the database restore can takes?
  21. 21. RPO What resources can impact on my RPO
  22. 22. DB SNAPSHOT How many time we need to recover all data from our snapshot?
  23. 23. Backup & Restore – RPO & RTO Configuration • Resources Allocation • ??? • Restore Operation • ??? • DNS • TTL 30 minutes • Snapshot • Every 24 hour Effects • RTO – RecoveryTime Objective • 30 minutes + ??? + ??? • RPO – Recovery Point Objective • 24 hour • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  24. 24. COSTS ON S3 (AWS) 0.085$ per GB durability 99,999999999% $0.068 / GB durability 99,99% $0.010 / GB durability 99.999999999% [glacier]
  25. 25. Pilot light We can let a little resource always active that can help us to activate a whole system
  26. 26. Replication Basically pilot-light is based on database replication strategies For MySQL async replication is used as base strategy http://www.slideshare.net/corleycloud/m ysql-scale-out-cloudparty-2013-milano- talent-garden
  27. 27. ON-PREMISE –WEB APP
  28. 28. READ REPLICA ON A CLOUD PROVIDER
  29. 29. MOVETO CLOUD ON A DISASTER
  30. 30. RTO & RPO? Things to remember…
  31. 31. RTO What resources can impact on my RTO
  32. 32. RESOURCES ALLOCATION run and configure new instances typically takes a couple of minutes you have always to care about resources and times.
  33. 33. DNS PROPAGATION DNS takes a little while before propagate new addresses (TimeTo Live)
  34. 34. RPO What resources can impact on my RPO
  35. 35. DB REPLICATION Remember that Master/Slave replications are ASYNC! It implies LAG replication time and that impact with your RPO!
  36. 36. MONITORYOUR INFRASTRUCTURE Setting an RPO about 20 minutes implies that your replication LAG time should be always under 20 minutes!
  37. 37. Pilot Light – RPO & RTO Configuration • Resources Allocation • 20 minutes • DNS • TTL 30 minutes • Replication LAG • 20 minutes Effects • RTO – RecoveryTime Objective • 50 minutes • RPO – Recovery Point Objective • 20 minutes • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  38. 38. COSTS ON AWS 0.06$ per hour  1 m1.small~43$ per month 0.05$ per GB EBS 0.05$ per 1 million I/O requests EBS
  39. 39. WARM STANDBY Extends pilot-light resource allocation and preparation
  40. 40. Warm Standby
  41. 41. Warm Stand-by
  42. 42. Warm StandBy – RPO & RTO Configuration • Resources Allocation • 5 minutes • DNS • TTL 30 minutes • Replication LAG • 20 minutes Effects • RTO – RecoveryTime Objective • 35 minutes • RPO – Recovery Point Objective • 20 minutes • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  43. 43. COSTS ON AWS 0.06$ per hour 2 m1.small~86$ per month 0.05$ per GB EBS 0.05$ per 1 million I/O requests EBS ELB 20$ per month
  44. 44. PILOT LIGHT VS WARM STAND-BY Effectively in our examples Pilot Light is much more effective than warm stand-by. Doesn’t it?
  45. 45. DEPENDS ON ASSUMPTIONS We assume that we don’t need to scale out our database but that is enough to scale it up only! Resource allocation for new read replicas? How long does it takes?
  46. 46. THANKS FOR LISTENING

×