0
CLOUDCONF 2014
Database: backup e disaster recovery in Cloud
Walter Dal Mut
@walterdalmut – www.corley.it – walterdalmut.c...
DISASTER RECOVERY
Disaster recovery (DR) is about preparing for and recovering from a
disaster.
DISASTER
Any event that has a negative impact on
your business continuity or finances could be termed a disaster.
WHYWEARETALKINGABOUT DR?
• Over 70% of businesses involved in a major fire either do not reopen, or subsequently fail
with...
RTO – RECOVERYTIME
OBJECTIVE
This is the duration of time and the service level to which a business
process must be restor...
RTO what it implies?
• Have a system that records 1000 transaction at hour
• Take a snapshot of a system at 03:00 am (ever...
RPO – RECOVERY POINT
OBJECTIVE
This describes the acceptable amount of data loss measured in time.
RPO –WHAT IT IMPLIES?
• Have a system that records 1000 transaction at hour
• Take a snaphot of a system at 03:00 am (ever...
DISASTER RECOVERY STRATEGIES
Local
tape
backup
Online
backup
Pilot-Light
Warm
Stand-by
And
More…
$ $$$ $$$$$$
Seconds
Days
ON-PREMISE & CLOUD
Use cloud resources in order to provide business continuity
Disaster Recovery & Cloud?
• On Demand
• We can allocate and release new resources whenever we need
• Cost Effective
• Pay...
FOCUS ON DATABASES
We will focus on MySQL but you can apply to your infrastructure without
any problem.
BACKUP & RESTORE
Take a snapshot of a system and restore it when you need it
Application
Backup
Restore
RTO & RPO?
Things to remember…
RTO
What resources can impact on my RTO
RESOURCES
ALLOCATION
How fast we can set up all resources, eg: instances, network, etc etc.
DB RESTORE
How many time the database restore can takes?
RPO
What resources can impact on my RPO
DB SNAPSHOT
How many time we need to recover all data from our snapshot?
Backup & Restore – RPO & RTO
Configuration
• Resources Allocation
• ???
• Restore Operation
• ???
• DNS
• TTL 30 minutes
...
COSTS ON S3 (AWS)
0.085$ per GB durability 99,999999999%
$0.068 / GB durability 99,99%
$0.010 / GB durability 99.999999...
Pilot light
We can let a little resource always active
that can help us to activate a whole
system
Replication
Basically pilot-light is based on database
replication strategies
For MySQL async replication is used as
base ...
ON-PREMISE –WEB APP
READ REPLICA ON A CLOUD PROVIDER
MOVETO CLOUD ON A DISASTER
RTO & RPO?
Things to remember…
RTO
What resources can impact on my RTO
RESOURCES
ALLOCATION
run and configure new instances typically takes a couple of minutes
you have always to care about res...
DNS PROPAGATION
DNS takes a little while before propagate new addresses (TimeTo Live)
RPO
What resources can impact on my RPO
DB REPLICATION
Remember that Master/Slave replications are ASYNC!
It implies LAG replication time and that impact with you...
MONITORYOUR
INFRASTRUCTURE
Setting an RPO about 20 minutes implies that your replication LAG time
should be always under 2...
Pilot Light – RPO & RTO
Configuration
• Resources Allocation
• 20 minutes
• DNS
• TTL 30 minutes
• Replication LAG
• 20 m...
COSTS ON AWS
0.06$ per hour  1 m1.small~43$ per month
0.05$ per GB EBS
0.05$ per 1 million I/O requests EBS
WARM STANDBY
Extends pilot-light resource allocation and preparation
Warm Standby
Warm Stand-by
Warm StandBy – RPO & RTO
Configuration
• Resources Allocation
• 5 minutes
• DNS
• TTL 30 minutes
• Replication LAG
• 20 m...
COSTS ON AWS
0.06$ per hour 2 m1.small~86$ per month
0.05$ per GB EBS
0.05$ per 1 million I/O requests EBS
ELB 20$ per...
PILOT LIGHT
VS
WARM STAND-BY
Effectively in our examples
Pilot Light is much more effective than warm stand-by.
Doesn’t it?
DEPENDS ON
ASSUMPTIONS
We assume that we don’t need to scale out our database but that is
enough to scale it up only!
Reso...
THANKS FOR LISTENING
Upcoming SlideShare
Loading in...5
×

Disaster Recovery - On-Premise & Cloud

662

Published on

We will cover different scenarios for Disaster Recovery

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
662
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
39
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Disaster Recovery - On-Premise & Cloud"

  1. 1. CLOUDCONF 2014 Database: backup e disaster recovery in Cloud Walter Dal Mut @walterdalmut – www.corley.it – walterdalmut.com
  2. 2. DISASTER RECOVERY Disaster recovery (DR) is about preparing for and recovering from a disaster.
  3. 3. DISASTER Any event that has a negative impact on your business continuity or finances could be termed a disaster.
  4. 4. WHYWEARETALKINGABOUT DR? • Over 70% of businesses involved in a major fire either do not reopen, or subsequently fail within 3 years of fire. (Source continuitycentral.com) • 80% of businesses affected by a major incident either never re-open or close within 18 months (SourceAxa) • 70 percent of companies go out of business after a major data loss (Source continuitycentral.com) • 80% of businesses suffering a computer disaster, who have no disaster recovery plans, go out of business. (Source “A BridgeToo Far”, IBM BusinessRecovery Service & Cranfield, 1993) • A recent study from Gartner, Inc., found that 90 percent of companies that experience data loss go out of business within two years. • 80 percent of companies without well-conceived data protection and recovery strategies go out of business within 2 years of a major disaster. (Source: US NationalArchives and Records Administration)
  5. 5. RTO – RECOVERYTIME OBJECTIVE This is the duration of time and the service level to which a business process must be restored after a disaster
  6. 6. RTO what it implies? • Have a system that records 1000 transaction at hour • Take a snapshot of a system at 03:00 am (every day) • 10:00 am a disaster event occurs • You spend 1 hour to sort things out for the backup (off-site, preparation, etc.) • Recover operation takes 4 hours in order to get back to operate (at minimum service level) • 5 hours is the: RECOVERYTIME OBJECTIVE
  7. 7. RPO – RECOVERY POINT OBJECTIVE This describes the acceptable amount of data loss measured in time.
  8. 8. RPO –WHAT IT IMPLIES? • Have a system that records 1000 transaction at hour • Take a snaphot of a system at 03:00 am (every day) • 10:00 am a disaster event occurs • In this case we lost around 7000 transactions. • 1000 transactions 03:00 04:00 • 1000 transactions 04:00 05:00 • … • But: we are accepting 24 hours of data loss 24000 transactions (RPO)
  9. 9. DISASTER RECOVERY STRATEGIES Local tape backup Online backup Pilot-Light Warm Stand-by And More… $ $$$ $$$$$$ Seconds Days
  10. 10. ON-PREMISE & CLOUD Use cloud resources in order to provide business continuity
  11. 11. Disaster Recovery & Cloud? • On Demand • We can allocate and release new resources whenever we need • Cost Effective • Pay as you go model.We pay only for resources that we are effectively using • Scalable • We can scale freely and adapt our strategy thanks to autoscaling and other mechanisms • Secure • Control doesn’t mean security
  12. 12. FOCUS ON DATABASES We will focus on MySQL but you can apply to your infrastructure without any problem.
  13. 13. BACKUP & RESTORE Take a snapshot of a system and restore it when you need it
  14. 14. Application
  15. 15. Backup
  16. 16. Restore
  17. 17. RTO & RPO? Things to remember…
  18. 18. RTO What resources can impact on my RTO
  19. 19. RESOURCES ALLOCATION How fast we can set up all resources, eg: instances, network, etc etc.
  20. 20. DB RESTORE How many time the database restore can takes?
  21. 21. RPO What resources can impact on my RPO
  22. 22. DB SNAPSHOT How many time we need to recover all data from our snapshot?
  23. 23. Backup & Restore – RPO & RTO Configuration • Resources Allocation • ??? • Restore Operation • ??? • DNS • TTL 30 minutes • Snapshot • Every 24 hour Effects • RTO – RecoveryTime Objective • 30 minutes + ??? + ??? • RPO – Recovery Point Objective • 24 hour • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  24. 24. COSTS ON S3 (AWS) 0.085$ per GB durability 99,999999999% $0.068 / GB durability 99,99% $0.010 / GB durability 99.999999999% [glacier]
  25. 25. Pilot light We can let a little resource always active that can help us to activate a whole system
  26. 26. Replication Basically pilot-light is based on database replication strategies For MySQL async replication is used as base strategy http://www.slideshare.net/corleycloud/m ysql-scale-out-cloudparty-2013-milano- talent-garden
  27. 27. ON-PREMISE –WEB APP
  28. 28. READ REPLICA ON A CLOUD PROVIDER
  29. 29. MOVETO CLOUD ON A DISASTER
  30. 30. RTO & RPO? Things to remember…
  31. 31. RTO What resources can impact on my RTO
  32. 32. RESOURCES ALLOCATION run and configure new instances typically takes a couple of minutes you have always to care about resources and times.
  33. 33. DNS PROPAGATION DNS takes a little while before propagate new addresses (TimeTo Live)
  34. 34. RPO What resources can impact on my RPO
  35. 35. DB REPLICATION Remember that Master/Slave replications are ASYNC! It implies LAG replication time and that impact with your RPO!
  36. 36. MONITORYOUR INFRASTRUCTURE Setting an RPO about 20 minutes implies that your replication LAG time should be always under 20 minutes!
  37. 37. Pilot Light – RPO & RTO Configuration • Resources Allocation • 20 minutes • DNS • TTL 30 minutes • Replication LAG • 20 minutes Effects • RTO – RecoveryTime Objective • 50 minutes • RPO – Recovery Point Objective • 20 minutes • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  38. 38. COSTS ON AWS 0.06$ per hour  1 m1.small~43$ per month 0.05$ per GB EBS 0.05$ per 1 million I/O requests EBS
  39. 39. WARM STANDBY Extends pilot-light resource allocation and preparation
  40. 40. Warm Standby
  41. 41. Warm Stand-by
  42. 42. Warm StandBy – RPO & RTO Configuration • Resources Allocation • 5 minutes • DNS • TTL 30 minutes • Replication LAG • 20 minutes Effects • RTO – RecoveryTime Objective • 35 minutes • RPO – Recovery Point Objective • 20 minutes • Downtime per month • 99.8% availability 86.23 minutes • 99.95% availability 21.56 minutes
  43. 43. COSTS ON AWS 0.06$ per hour 2 m1.small~86$ per month 0.05$ per GB EBS 0.05$ per 1 million I/O requests EBS ELB 20$ per month
  44. 44. PILOT LIGHT VS WARM STAND-BY Effectively in our examples Pilot Light is much more effective than warm stand-by. Doesn’t it?
  45. 45. DEPENDS ON ASSUMPTIONS We assume that we don’t need to scale out our database but that is enough to scale it up only! Resource allocation for new read replicas? How long does it takes?
  46. 46. THANKS FOR LISTENING
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×