• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Dmg emc-avamar-optimized-backup-recovery-dedupe[1]

Dmg emc-avamar-optimized-backup-recovery-dedupe[1]






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Dmg emc-avamar-optimized-backup-recovery-dedupe[1] Dmg emc-avamar-optimized-backup-recovery-dedupe[1] Document Transcript

    • Data Joseph Martins and Walter Purvis Mobility Group January 9, 2009 Research EMC Avamar: Optimized Backup and RecoveryPerspective with Source/Global Data De-duplication Abstract: Data growth will continue to outpace the growth of IT budgets for the foreseeable future and IT departments will be expected to manage ever more data with proportionally fewer resources and staff. Now more than ever organizations need cost-effective data protection solutions. Sustainable, reliable access to digital information—office documents, email, instant messages, online transactions, images, video, and more—is imperative and essential. Disk-based backup solutions that use data de-duplication technology provide an affordably sustainable, more manageable alternative to traditional tape-based backups. This DMG Research Perspective examines the data protection challenges found in remote office and VMware environments, and presents the advantages of deploying an EMC Avamar solution to meet those challenges. Data Protection Challenges Traditional Tape and Disk Backups Moving away from tape-based backup and recovery infrastructure is a strategic imperative, especially for companies with resource-strapped remote offices. There are several well-known problems with tape-based backup and recovery: Copyright © 2002-2009 Data Mobility Group, LLC. All Rights Reserved. Reproduction of this publication without prior written permission is forbidden. Data Mobility Group believes the statements contained herein are based on accurate and reliable information. However, because information is provided to Data Mobility Group from various sources, we cannot warrant that this publication is complete and error-free. Data Mobility Group disclaims all implied warranties, including warranties of merchantability or fitness for a particular pur- pose. Data Mobility Group shall have no liability for any direct, incidental, special, or consequential damages or lost profits. The opinions expressed herein are subject to change without notice.datamobilitygroup.com 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com • A lack of experienced on-site staff (especially in branch offices) • Unacceptably slow recovery times from off-site • Inefficient recovery of small numbers of lost files • Unreliable processes especially prone to human error • Security concerns due to lost or stolen tapes • Difficulty and cost to respond to e-discovery requests and compliance-related inquiries. Yet many organizations continue to tolerate the expense and headaches of tape, either because they are unaware of better disk-based alternatives or they believe that disk-based alternatives are still too expensive. Some organizations have purchased disk-based virtual tape library (VTL) systems. VTLs do reduce or eliminate some of the headaches of tape backup management, and they allow organizations to leverage their investments in fibre-channel infrastructure and continue using their existing backup processes. However, in a simple head-to-head total cost of ownership (TCO) comparison, backing up to disk (without de-duplication) is still considerably more expensive than backing up to tape. Storage Hungry Virtualization Rapid data growth, backup multiplicity, and highly redundant virtual computing environments conspire to make a bad situation worse. Driven largely by power, floor space, and manageability constraints, organizations have embraced server virtualization as a way to consolidate many physical servers into fewer physical servers running large numbers of virtual machines. Unfortunately, server virtualization massively increases the resource consumption of traditional tape and disk backup processes. Running traditional backup software on individual virtual machines results in resource contention for the underlying physical server’s network bandwidth, CPU, memory, and disk—making it very difficult to meet shrinking backup windows. And, running traditional backup software at the host server level consumes quite a large amount of disk space (for example, copies of VMware’s virtual machine disk (VMDK) files might be 10, 50, 100, or more gigabytes each). Because so few of the files within cloned VMDKs change on a day-to-day basis it makes no sense to regularly back up dozens or hundreds of duplicate copies across identical VMDKs. The costs of network bandwidth and disk storage alone make this an untenable approach. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com The Economics of Data De-duplication In its broadest sense, de-duplication is the removal of redundant data from a defined set of data. Unlike traditional data compression methods that are applied to individual files, de-duplication can be applied across the files in a dataset and across storage devices at both the file and sub-file level, depending upon the vendor’s offering. The cost benefit of data de-duplication is undeniable even at ratios of just 10 or 20-to-1. DMG’s own year-long research project using EMC Avamar revealed a massive 76-to-1 disk storage savings over traditional daily full backups and greater than 18-to-1 space savings over traditional weekly-full and daily incremental backups. On virtual servers the space savings across VMDKs can easily exceed 40-to-1. And, while disk drives are relatively inexpensive, the fully-loaded cost (i.e. energy, floor space, labor, maintenance costs, etc.) of operating and managing unnecessarily large disk systems is not. De-duplication squeezes as much capacity as possible out of the fewest number of storage devices to minimize costs across the board. In effect, data de-duplication makes disk-based backup solutions more affordable than tape. Backup Data De-duplication Approaches Target vs. Source De-duplication Backup data de-duplication solutions differ significantly in terms of where they perform the process of finding redundant data. Generally speaking, data de-duplication can occur at the target or at the source, depending on the selected vendor solution. Where it occurs determines the impact on an organization’s ability to meet shrinking backup windows, while leveraging existing infrastructure and resources. Target de-duplication products, as their name implies, are typically backup targets for traditional backup software. Backup data is de-duplicated only when it reaches the target backup hardware device. This means that all of the data from the backup source, including lots of redundant data, is sent across the network or virtual infrastructure during daily backup operations. In many situations, the vast majority of data is redundant and the result is wasted network and disk resources and unnecessarily lengthy backup processes. In bandwidth-constrained environments—for example, © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com remote offices attempting to back up to a corporate data center over a WAN, or multiple virtual servers contending for the same network interface on one physical server—finding enough bandwidth to complete backups within available backup windows is often unaffordable or impossible. In addition, long-running backup jobs can deprive applications needed server resources and have a negative impact on end-user productivity. In contrast, EMC Avamar de-duplicates backup data at the source. As a result, redundant data is eliminated up-front at the start of the backup process (at the client) before any data is moved across the network. The primary benefits of examining and de-duplicating backup data at the source are: • Fast, efficient daily full backups since only unique sub-file data is moved over the network • Significantly lower resource contention across congested networks and virtual infrastructure • Shorter required backup windows due to less data in flight • Lower operating expenses and the ability to leverage existing network infrastructure EMC Avamar also de-duplicates backup data globally, across sites and servers. Only a single copy of each sub-file variable length data segment is stored to disk during backup operations. As a result, Avamar can significantly reduce the required total backend disk storage, in addition to providing the benefits of de-duplication at the source. Does Data Segment Length Matter? The short answer is yes. Leading data de-duplication solutions on the market reduce data at the sub-file level, but some use fixed-length data segments while others use variable length segments. Data segment length, or more accurately, the ability to vary data segment length based on commonality within the data set, ensures maximum data reduction. After all, that is the purpose of de-duplication. As users edit their files or save new files, the de-duplication engine that utilizes variable length data segment technology is better equipped to detect the changes and store only the new, unique segments during backup operations. For example, a fixed-length segment solution can be fooled by the insertion of a single new character into an existing file since it erroneously views the logical © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com shift in data as entirely new segments—when, in fact, the original data is mostly unchanged. As a result, fixed-length data segment solutions are significantly less efficient and require additional network bandwidth and storage capacity over time. EMC Avamar sports one of the most efficient backup data de-duplication engines available today and consistently outperforms the competition in de-duplication bake-offs. Its sub-file, variable- length de-duplication technology is not fooled by data insertions or deletions, so only the new, truly unique data segments are backed up. Avamar’s variable length data segments are just 12KB on average—significantly more efficient than fixed length segment solutions that may have a minimum default fixed segment size of 128KB, 256KB or more. It is easy to understand how Avamar efficiently de-duplicates data at the source (and globally across multiple sites) to minimize the amount of data moved across the network and ultimately stored to disk. Scalability Leading backup solutions make it easy to increase performance and capacity when needed, but not all are as simple as advertised. One vendor’s approach separates the backup data from its associated metadata. While this approach seems conceptually elegant, users quickly realize that separately managing and scaling the metadata and backup data can be an unsustainable nightmare. More boxes, more space, more power and more system management overhead is exactly what most companies do not want. The ability to simply drop in an additional self-contained box with incrementally more backup compute power and capacity provides the sort of organic scalability companies desire, without the unnecessary cost or complexity of scaling and managing the metadata and backup data separately. EMC Avamar’s scalable grid architecture enables additional compute power and disk capacity by simply adding another Avamar server (node) to the grid, whether IT managers opt for its out-of- the-box Data Store, or install the software on their own commodity servers. Existing backup data is automatically load-balanced across the newly added server for maximum performance, without any downtime. No need to separately manage and plan for the growth of metadata and backup data. Just drop in a new box and go. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com High Availability and Reliability The idea of having to add (and pay for) additional hardware, software, licenses, and training to achieve high availability is unappealing. When high availability is native to an appliance, there is no need for the added complexity and the corresponding costs of additional external hardware and/or software. Unlike Avamar’s closest competitor, which provides no high availability without the addition of external disk, specialized (and expensive) clustering software, and training, EMC Avamar’s Redundant Array of Independent Nodes (RAIN) architecture provides built-in high availability and fault tolerance across nodes. EMC Avamar nodes continuously communicate and cooperate without administrative intervention, automatically detect and configure new nodes, automatically check the Avamar server’s integrity twice daily and verify data recoverability daily with no down time. Backup and Recovery Performance Backup and recovery performance can be influenced by many factors, including the type of servers, network links, and other infrastructure considerations. However, the right data de-duplication technology can significantly increase performance, even across slow or congested environments. As discussed earlier, data segment size makes a big difference in de-duplication efficiency, with the clear advantage going to solutions that de-duplicate data using variable length data segments. Where the de-duplication occurs also plays an important part, since de-duplicating data at the source always results in less data to move across slow, congested physical or virtual environments. Only EMC Avamar de-duplicates backup data at the source (and globally) using variable length data segments. Not surprisingly, Avamar is ideally suited for challenging backup environments such as remote office / branch office (ROBO) and virtual environments (e.g. VMware). Moving only the new, unique data segments during daily full backup operations means ROBO environments can leverage existing wide area network (WAN) links and centralize backup management. And many virtual environments can actually increase server consolidation levels, since Avamar removes the backup bottlenecks. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com In all cases, Avamar delivers fast, daily full backups. And, Avamar’s single-step recovery eliminates the tedious process of recovering from the last good full and subsequent incremental backups to reach the desired recovery point. As a result, users can eliminate or greatly reduce their reliance on tape since Avamar provides efficient, affordable backups that enable data to be retained locally on disk for extended periods of time. Deployment Options and Application Support Given the incredibly diverse server, storage and application infrastructures found in modern business, deployment flexibility and application support are essential elements of any backup technology offering. EMC Avamar offers the broadest range of deployment options of any source/global data de- duplication backup solution. Deployment options include: • Avamar agents installed directly on the systems to be protected (great for smaller remote offices because it eliminates the need for extra local hardware). • Avamar software installed on industry standard certified servers (perfect for organizations that wish to choose or reuse their own hardware). • EMC Avamar Data Store—a pre-packaged, pre-configured solution consisting of Avamar software bundled with EMC hardware. Scalable from single to multiple nodes to provide the equivalent of up to several petabytes of cumulative traditional backup storage (a turnkey solution from EMC that simplifies ordering, deployment, and service). • Avamar Virtual Edition—an industry first that enables an Avamar server to be deployed as a virtual appliance on an existing ESX server (to leverage existing compute power and disk storage). And when it comes to protecting VMware environments, the Avamar software agent can be installed within the VM Guest, at the Service Console, or at the VMware Consolidated Backup (VCB) proxy server. In all cases, Avamar efficiently de-duplicates backup data at the source, and globally across the entire environment. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com EMC Avamar also supports the broadest array of enterprise applications of any source de- duplication technology. From Avamar’s proven Virtual Edition (a certified VMware virtual appliance) to its unrivaled support for NetWare, Oracle, DB2, VMware ESX 3.5, Windows Vista, Windows NT, Unix, Linux, MAC OS, NetApp and EMC Celera filers, MS Exchange, MS SQL, and other key applications and environments. EMC Avamar also integrates with EMC NetWorker. As a result, NetWorker users can deploy a single agent and decide which servers to de-duplicate via the NetWorker Management Console to leverage their existing interface and schedules. Reporting One of the key components of any backup solution, essential to maximize productivity and minimize downtime, is a detailed management user interface and reporting tool. EMC Avamar’s Enterprise Manager dashboard provides an intuitive, at-a-glance view of the entire Avamar environment. Combined with the Avamar Administrator, Avamar delivers user friendly, powerful native backup management and reporting, integration with EMC Backup Advisor, and a point-and-click interface that minimizes the number of clicks necessary to complete most tasks. Real World Avamar The results of Avamar’s performance during our in-house 2007 road test were crystal clear. Still, we wanted to find out if other companies experienced similar benefits in much larger, more distributed environments. Fortunately, we had the opportunity to speak to the director of one such operation at a leading Fortune 10 multinational corporation. His organization provided great service to its data centers, but he wanted to cost-effectively improve the level of service and support to some 300 remote offices each with 1-2 terabytes of data onsite. His challenges were many: • A large amount of data distributed across an equally large number of remote offices • Massive data growth (double and even triple digit) © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com • Shrinking backup windows • A need to cost-effectively store more data for longer periods of time • The risks associated with tape jockeying and having media in 3rd party hands offsite • A need to support a broad range of operating systems and applications Avamar was chosen for its broad client support, manageability, superior de-duplication, and speedy full backups among other reasons. The company expects its data volume to double in 2009. With 35-day retention policies for daily workloads, in addition to single monthly and 1-7 annuals, Avamar’s de-duplication will continue to keep a leash on data growth. Backup windows have improved 33-45% and clients are now backed up in 6 hours or less. And the company is able to use a single solution across a variety of platforms, from HP-UX, Linux, and Solaris to SQL Server, Oracle, and NAS. The company’s next big push is for the use of Avamar in the data center and greater replication between facilities. The plan is to retain short term data onsite, and retain long-term data in an offsite facility using Avamar. Summing up EMC Avamar is just one part of a broad portfolio of backup and recovery solutions that EMC has assembled to satisfy the data protection needs of nearly any organization. In addition to Avamar’s source-based de-duplication, EMC also integrates Avamar into EMC NetWorker, and offers target-based de-duplication solutions with its Disk Library DL1500, DL3000 and DL4000 Series products. Data Mobility Group has found EMC Avamar to be one of the best de-duplicating backup and recovery solutions available today. In February 2008, we published the results of a 13-month, in-house EMC Avamar road test.1 EMC’s Avamar technology made it possible for one person to set up, schedule, monitor and manage more than 365 full daily backups of nearly 200 GBs of data distributed across several servers. By the end of the road test the system had consumed less than 1/76th the capacity required by traditional daily full backups, 1/18th the capacity required by traditional weekly full and daily incremental backups, and the backups occurred very quickly with minimal network impact. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886
    • Data Mobility Groupwww.datamobilitygroup.com Total Backup Storage Consumed Over 13 Months ~18 TB ~76 TB < 1 TB EMC Avamar Traditional backup methods Traditional backup methods (daily fulls) (weekly fulls and daily (daily fulls) incrementals) EMC Avamar has one of the most efficient data de-duplication engines available today, consistently outperforming the competition in de-duplication bake-offs. The graphic above illustrates just how effective Avamar can be after one year in an ordinary office environment such as DMG’s. Its variable-length, sub-file, de-duplication at the source (and globally across multiple sites) minimizes the amount of data stored on backups and moved over the network. The Avamar product lineup offers outstanding flexibility, manageability, reliability, infrastructure reusability, and proven cost-savings. There are many environments in which EMC Avamar could be usefully deployed, but it is particularly advantageous for organizations that have remote office environments, extensive VMware deployments, or a need for LAN- based backup within their data centers. Organizations in search of an affordably sustainable, reliable, more manageable alternative to tape- based backup cannot afford to overlook EMC Avamar. Footnotes 1 High Value Remote Office Data Protection With EMC Avamar, published February 6, 2008. © 2002-2009 Data Mobility Group. All Rights Reserved. 76 Northeastern Blvd. Suite 29A, Nashua NH 03062 Phone: 603.835.6141 Fax: 877.254.4886