NICTA, Disaster Recovery Using OpenStack
Upcoming SlideShare
Loading in...5

NICTA, Disaster Recovery Using OpenStack



Jorke Odolphi, NICTA, Disaster Recovery Solution using OpenStack, Thurs, 3:50 pm session

Jorke Odolphi, NICTA, Disaster Recovery Solution using OpenStack, Thurs, 3:50 pm session



Total Views
Views on SlideShare
Embed Views



1 Embed 2 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

NICTA, Disaster Recovery Using OpenStack NICTA, Disaster Recovery Using OpenStack Presentation Transcript

  • Building a Disaster Recovery Solution using OpenStack Jorke Odolphi Principal Research Engineer NICTA @jorke
  • The Team
  • Yuru – ‘cloud’, Gamilaraay People NSW
  • Problem The cloud can fail.Online businesses that rely and benefitmost from the cloud don’t have the skills to handle failure.
  • Disaster Recovery process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to anorganisation after a natural or human-induced disaster * *according to wikipedia..
  • RPO Recovery Point Objective“maximum tolerable period in which data might be lost from an IT Service due to a Major incident…” * *according to wikipedia..
  • RTO Recovery Time Objective “duration of time and a service level withinwhich a business process must be restored after a disaster…” * *according to wikipedia..
  • Somewhere..Recovery PointObjective Realtime recovery/ failover 0 downtime Recovery Time Objective Sometime...
  • Our Goal Without re-architecting your application;Provide a configurable warm standby solution, with a known consistent RPO, reducing RTO, minimising business impact.
  • Goals and ChallengesReplicate application over to OpenStack incase of a disaster – Preserve the running environment of the application, this includes: • Compute instances • Networks • DNSMinimise RTO and RPO AND cost!
  • IP / Load Balanced Web front end Apache/Nginx/IIS Private IP Application Processing/memcache Private IP Database MySQL/PostgreSQL/MSSQL
  • Architecting for DR in CloudVirtualise your servers – snapshotting support in hypervisor primarily at the diskUse Dynamic DNS solutions – E.g. Route 53, Anycast DNS
  • Compatibility across IaaS CloudsCloud Framework Compute Object Block Network SecurityProvider Instance Store Storage GroupAWS Custom ✓ ✓ ✓ DHCP ✓Rackspace Custom ✓ ✓ ✗ STATIC ✗Ninefold CloudStack ✓ ✓ ✓ DHCP ✓TryStack OpenStack ✓ ✓ ✓ DHCP ✓HP Cloud OpenStack ✓ ✓ ✗ DHCP ✓ • Replication from one cloud to another is NOT always possible • Some clouds do not have all the technology pieces (e.g., Block Storage) • Minimum requirements for replicating application servers: • compute instance and persistent storage, such as object store or block storage • Snapshot service (to ensure point-in-time consistency) • Hypervisor support (e.g., PVGrub)
  • Overview of DR Process Take snapshot Create volume AWS Partition Mount new Download from Send to storageOpenStack instance storage
  • Building DR using OpenStackProgress: – Deploying OpenStack in our NICTA lab – Successfully replicated AWS compute instances to OpenStack • In Rackspace OpenStack public cloud (private beta) • Instances created from standard 64-bit EXT3 AWS OpenSuse imageRequirements: – Xen support for PVGrub – Write access to partition table – Network support
  • ProblemsLatencyPoint in TimeLog and replay / transactionalHow do modern databases handle brokentransactions / problem disks?Rollback
  • Optimisations: Incremental BackupTypical AWS system volume is around 10GBReplication is tricky for large data volumes – Initial backup: • Send the whole data volume (unavoidable!) • Optimise by compression and skipping empty space (0’s) – Subsequent backups: • Incremental – partition a volume into chunks and resend only the difference (the ‘delta’)
  • Large Data Transfer AcrossCloud DatacentersWhy so slow?
  • Optimisations: Large Data Transfer Across Cloud Datacenters for DRProblem: Transferring large data volumes is slow – Where is the bottleneck? • Reading from the source volume? YES!! • Transferring across LAN/WAN? • Writing to destination volume? • Our solution Data Transfer Evaluations 1 Clone 4 ClonesRapidly Cloning data 190 140volumes from snapshots – Parallel transfers 50 40 Volume Scan (MB/s) End-to-end Transfer (MB/s)
  • Reversing..
  • Point us to Replicate to Automatically If the worstyour instances new sync changes happens: cloud/region every hour failover
  • Questions? Or answers? Jorke @jorke