The Pensions Trust - VM Backup Experiences
Upcoming SlideShare
Loading in...5
×
 

The Pensions Trust - VM Backup Experiences

on

  • 1,085 views

 

Statistics

Views

Total Views
1,085
Views on SlideShare
1,079
Embed Views
6

Actions

Likes
0
Downloads
7
Comments
0

2 Embeds 6

http://www.vmug.org.uk 5
http://www.vmug.co.uk 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Old Edinburgh office held about 60 staff. It had its own ESX servers/SAN equipment, etc. Was used as full DR location. With bidirectional backup coverage. 10mb WAN link – we’ve had to work within this at all times.
  • Good old BackupExec! Tape has always had its issues: Bad media. Lost tapes! Insecure storage. Low number of available staff @ DR meant low numbers of servers available to be restored.
  • TPT has no regulatory compliance issues.
  • How much time have we all spent on backups? We need to eliminate this for a system that ‘just works’. I myself used to spend every evening dialling in and checking the state of each backup or replication job and resetting the various jobs. It used to take up a LOT of my time. Some companies may choose to only have an archive onsite and live data only offsite. They’ll accept the loss of the archive in a disaster event. Its all about risk and it’s the business who make the call and set the budget. We want appplication level restore without the need for multiple backup applications. One application that can do it all.
  • Fast recovery depends on your means! Synchronous, real time replication is great but costs the earth, both in storage and bandwidth. Its all about what is acceptable to your business. Consistent data – Exchange, SQL, AD to be up and running without the need to conduct repairs. DR normally means desktops need t be available too – VDI is the way to go here. Have a standby virtual desktop pool, keep it updated with your head office images and simply fire it up via the web from your offsite ESX when needed.
  • BEFORE SLIDE So, what happened with Mirrorview? A reseller recently told me that EMC have stopped selling Mirrorview/A, they were told that it ‘never worked properly’. Unfortunately, it’s the product we to try storage level replication. Mirrorview/S is still available.
  • BEFORE SLIDE So, what happened with Mirrorview? A reseller recently told me that EMC have stopped selling Mirrorview/A, they were told that it ‘never worked properly’. Unfortunately, it’s the product we to try storage level replication. Mirrorview/S is still available.
  • AFTER SLIDE: Instead of getting a new SAN, we had to work out what functionality we were missing from our current SAN – replication and deduplication efficiency, and get that from an alternate device.
  • AFTER SLIDE: Instead of getting a new SAN, we had to work out what functionality we were missing from our current SAN – replication and deduplication efficiency, and get that from an alternate device.
  • How much time have we all spent on backups? We need to eliminate this for a system that ‘just works’. I myself used to spend every evening dialling in and checking the state of each backup or replication job and resetting the various jobs. It used to take up a LOT of my time. Backup archive – some companies need to meet regulatory compliance and keep many years worth of data. Deduplicated storage devices are a great option here. Some companies may choose to only have an archive onsite and live data only offsite. They’ll accept the loss of the archive in a disaster event. Its all about risk and it’s the business who make the call and set the budget. We want appplication level restore without the need for multiple backup applications. One application that can do it all.
  • Fast recovery depends on your means! Synchronous, real time replication is great but costs the earth, both in storage and bandwidth. Its all about what is acceptable to your business. Consistent data – Exchange, SQL, AD to be up and running without the need to conduct repairs. DR normally means desktops need t be available too – VDI is the way to go here. Have a standby virtual desktop pool, keep it updated with your head office images and simply fire it up via the web from your offsite ESX when needed.

The Pensions Trust - VM Backup Experiences The Pensions Trust - VM Backup Experiences Presentation Transcript

  • VMware Backup Experiences Darren Bull Business Support Manager, The Pensions Trust
    • I’m no expert – jump in with comments/corrections
    • Everybody is different –
      • Each solution will depend on budget, recovery objectives and available infrastructure.
    • These our only our experiences –
      • Things that didn’t work for us may work for you.
      • We’ll focus on the things we have worked with.
    Before we start
    • 160 staff.
    • 3 sites:
      • Leeds/Edinburgh/London
      • Originally DR site was Edinburgh office.
      • Since downsize of Edinburgh office, now use rented rack space to house DR kit.
    • 10mb WAN link to DR site.
    • 3 IT Infrastructure staff.
    About TPT
    • Legacy application - BackupExec.
    • LTO2 tape archive (with associated issues):
      • Tapes can go bad.
      • Stored off site with Iron Mountain.
      • Tapes have gone missing.
      • Management of tape rotation.
    • Manual rebuild of servers during DR test:
      • Dissimilar hardware.
      • 48 hours to complete, not able to do full recovery.
    Backups prior to virtualisation
    • Server consolidation began late 2006.
    • Complete summer 2007.
    • 40 virtualised servers. Approx 25 critical for DR.
    • Simplified disaster recovery & backups one of the main drivers for the project.
    The move to VMware
    • Information archiving
      • Keep at least the last 12 months.
    • Disaster recovery
      • Recover systems within 24 hours.
    VMware backups – considerations
    • Backups that work! No constant checking.
    • A backup archive.
      • On site and offsite.
    • Minimal administration, vSphere integration. Set it and forget it.
    • Quick backups, within the available window.
    • Quick restores.
      • File level
      • Image level.
      • Application level (i.e. Exchange mailboxes)
    • No tapes.
    • Efficient use of storage (de-duplication)
    • Secure backup data.
    Backups - what did we want?
    • Fast offsite recovery.
    • Consistent data.
      • SQL/Exchange/Active Directory.
    • This means desktops too:
      • Deployed VMware View 2009.
    DR - what do we want?
    • The business must decide its recovery objective and provide the funds to achieve it.
      • TPT Objective: 24 hours lost data acceptable.
    • Once the recovery objective is determined, many options may be ruled out.
      • TPT didn’t need synchronous real time replication, could use cheaper options to be up in 24 hours.
    • Even with small budgets, many things are possible:
      • Redeploy old ESX servers/storage offsite.
      • Shop around for bandwidth.
      • With the latest backup applications, you don’t need expensive storage to make some things happen.
    Limiting factors
    • Installed EMC Clariion CX3-20 as part of consolidation project.
      • 2 nd unit installed in old Edinburgh office.
    • Used Mirrorview/A for bidirectional site to site replication of VMFS data stores.
    • Continued to use BackupExec and tape for archiving.
    • Take snapshot of replicated LUN, make writeable, mount in ESX, power on VM for server recovery.
    TPT approach (1)
  • TPT Approach (1)
    • Asynchronous mode
      • TPT ran 1 job per LUN per day.
      • Replication of entire LUN.
      • New VM’s on replicated LUN’s added huge replication burden.
      • No de-duplication.
      • Available bandwidth an issue.
        • Mirrors wouldn’t just go slow, but fail completely.
        • Could only run so many sessions at once.
        • Mirrors fell further and further behind as failed jobs had to start from scratch.
        • Jobs needed constant monitoring.
    • EMC no longer sell it.
    Mirrorview/A - experiences
    • EMC/NetApp/HP (and others) now offer products that work much better with VMware:
      • Deduplicated primary storage
      • Changed block tracking - efficient replication over slow links.
    • Obtaining this functionality is expensive:
      • We found it difficult to obtain budget – management saw ‘nothing wrong’ with existing SAN.
    Mirrored SAN - alternatives
    • We needed to fix the replication problem.
    • Installed 2 x DataDomain DD510.
      • CIFS/NFS/VTL backup target.
      • Can mount as an ESX datastore.
      • Site to site bit level replication.
      • De-duplicated storage.
      • Massive savings on VMDK archive storage – 40x de-duplication achieved.
      • Acts as backup archive storage and offsite replication engine for disaster recovery.
      • All backups replicated offsite within 24 hours.
      • Throw away tapes.
      • Secure offsite backups, no physical media in transit.
    TPT approach (2)
  • TPT approach (2)
    • Tips before starting:
      • Cannot snapshot persistent disks.
      • Give a VM’s disks different names, even if on different LUN’s.
      • Throughput issue doing network backups using vSphere.
        • Service console LAN throughput limitation.
        • Patch has been released (but I’ve not tried it).
        • Affected any image level backup application using LAN mode.
      • ESX3.x Snapshot timeout issue:
        • 15 mins timeout, VC will report timeout to VCB proxy, even if ESX host continues and commits the snapshot.
      • Changed tracking must be enabled in a VM (VM hardware level 7).
    Change the backup software
    • Image level backup of VM’s to DataDomain.
    • DataDomain takes care of replication.
    • File level restore.
    • Restore server-by-server @ DR site.
    • TPT started with version 3.x. First installed late 2008.
    • vRanger now at version 4.
    • Use vReplicator for replication of VM’s.
    • Vizioncore now owned by Quest Software.
    Vizioncore vRanger
    • Struggled to work within backup window. 24 hour job cycle.
    • Had issues with snapshot timeouts (ESX 3.x).
      • Had to use LAN based backups direct to ESX to work around this.
    • Had issues with vRanger 3 backup naming inconsistencies:
      • ‘ Could not find the compressed disk to mount’ doing a FLR or DR site recovery.
      • Much messing around with VMX/VMDK/INFO files to repair this and get restores working.
      • Never really seemed to be fixed.
      • VSS integration never worked well.
    • Upgraded to vRanger Pro 4 – had the slow network backup issue and no VCB mode! Downgraded.
    vRanger experiences
    • Uses vStorage API.
      • Backups to ‘normal’ storage (e.g. NAS) incredibly quick after 1 st full (1tb file server backed up in 10 minutes).
      • No backups during office hours.
      • Deduplicated backup files.
    • Not the same performance with DataDomain:.
      • Inline dedupe performed by DataDomain slows things down a bit.
      • Disable compression and deduplication options in backup job.
      • Changed block tracking means things still work well.
    • It ‘just works’.
    • No more babysitting the backups.
    Veeam Backup & Replication
    • Uses changed block tracking to replicate changes to offsite replica VM.
    • We synchronise nightly.
      • One full backup of each VM.
      • One replica pass for each VM.
        • Can keep previous versions of replica offsite for archiving purposes - negates need for backup?
    • Full backups of ‘large change’ servers still done to DataDomain using Veeam, then DD replicates to its offsite partner.
    • DataDomain also used for backup archiving.
    • One click DR testing of replica servers.
      • Failover/failback using Veeam console.
    Veeam Replicas
  • TPT approach (3)
    • Veeam replicas
      • 20 servers up in approx 20 mins using failover function.
    • Veeam backups
      • 5 servers recovered from image level backups in approx 5 hours. Transactionally consistent.
    • Time taken for full network recovery – approx 6 hours.
      • If we had the bandwidth, would use 100% replicas.
    2010 – DR test
    • Veeam SureBackup – TPT wins:
      • Automatic verification testing.
      • Item level recovery?
      • User self service for deleted files?
      • We can power on direct from DataDomain at both primary and recovery sites.
      • No more 5 hour wait for non-replica servers to be recovered. Instant recovery, then storage vMotion.
      • DR restore may be minutes rather than hours…
    Veeam SureBackup
    • Backups that work! No constant checking. ACHIEVED.
    • A backup archive.
      • On site and/or offsite. ACHIEVED
    • Minimal administration, vSphere integration. Set it and forget it. ACHIEVED
    • Quick backups, within the available window. ACHIEVED
    • Quick restores.
      • File level. ACHIEVED
      • Image level. ACHIEVED
      • Application level (i.e. Exchange mailboxes). NOT YET!
    • No tapes. ACHIEVED
    • Efficient use of storage (de-duplication). ACHIEVED
    • Secure backup data. ACHIEVED.
    Backups - what did we want?
    • Fast offsite recovery. ACHIEVED VS. OBJECTIVE
    • Consistent data. ACHIEVED.
    DR - what do we want?
    • Get rid of tape.
    • Recovery objective (and therefore, budget) will drive what is possible with DR.
    • If doing SAN-SAN mirroring, get the replication sizing right.
    • Newer storage systems offer increased integration with VMware. If you have the budget, make use of these.
    • Veeam is an excellent, cost effective alternative to costly SAN-level technology.
    In conclusion…
  • Thank You Darren Bull Business Support Manager Verity House, Canal Wharf, Leeds LS11 5BQ Tel. 0113 234 5500 Direct. 0113 394 2533 Fax. 0113 234 5599 E-mail: [email_address] www.thepensionstrust.org.uk Thank You