Deduplication Fundamentals
Data Domain BasicsEasy integration with existing environmentControl TierTarget TierDisaster Recovery TierBackup and Archive ApplicationsCIFS, NFS, NDMP, DD BoostEthernetVirtual Tape Library (VTL) over Fibre ChannelEMCSymantecCommVaultIBMBakBone SoftwareVizioncoreReplicationDD890 applianceDD890 appliance2U
2 to 10 ports
10 and 1 GigabitEthernet; 8 Gb/s Fibre Channel
RAID 6
Up to 285 TB usable capacity with shelves
2 TB or 1 TB 7.2K rpm SATA HDD in shelf
File system
NVRAM
N+1 fans and redundant, hot-plug power suppliesData Deduplication: Technology OverviewStore more backups in a smaller footprintThurs IncrementalACKSecond Friday Full BackupFriday Full BackupMon IncrementalABHBCDEFLGHABCDAEFGTues IncrementalWeds IncrementalCBIEGJABCDEFG Backup 		Estimated 	 Data Logical Reduction	 PhysicalFRIDAY FULL	 1 TB	 2–4x	250 GBMonday Incremental	100 GB	7–10x	 10 GBTuesday Incremental	100 GB	7–10x	 10 GBWednesday Incremental	100 GB	7–10x	 10 GBThursday Incremental	100 GB	7–10x	 10 GBSecond FRIDAY FULL	 1 TB	50–60x	 18 GBTOTAL	 2.4 TB	7.8x	308 GBHIJKL
Retain: Store More for Longer with LessOver one year of retention in 3U of Data Domain deduplication storageBackup 	Cumulative 	Estimated 	PhysicalData 	Logical 	ReductionFirst Full	1 TB	4x	250 GBWeek 1April 7	2.4 TB	8x	308 GBWeek 2April 14	3.8 TB	10x	366 GBWeek 3April 21	5.2 TB	12x	424 GBMonth 1April 28	6.6 TB	14x	482 GBMonth 2May 31	12.2 TB	17x	714 GBMonth 3June 30	17.8 TB	19x	946 GBMonth 4July 31	23.4 TB	20x 	1,178 GBTOTAL	23.4 TB	20x 	1,178 GB
Data Integrity: Data Invulnerability ArchitectureGenerateChecksumVerifyDataRe-Checksum and CompareVerify the file system metadata integrityFile SystemDeduplicationVerify user data integrityLocal CompressionRAIDVerify stripe integrityEnd-to-end data verificationChecksumDeduplication, write to diskVerifySelf-healing file systemCleaningExpired dataDefragVerifyOtherRAID 6NVRAMSnapshotsEnd-to-end data verification
Network-Efficient Replication for True Disaster RecoveryLowers WAN costs; improves service level agreementsWANHomeHomeFlexible replicationOne-to-many
Many-to-one
Bi-directional
System-to-system

EMC Deduplication Fundamentals

  • 1.
  • 2.
    Data Domain BasicsEasyintegration with existing environmentControl TierTarget TierDisaster Recovery TierBackup and Archive ApplicationsCIFS, NFS, NDMP, DD BoostEthernetVirtual Tape Library (VTL) over Fibre ChannelEMCSymantecCommVaultIBMBakBone SoftwareVizioncoreReplicationDD890 applianceDD890 appliance2U
  • 3.
    2 to 10ports
  • 4.
    10 and 1GigabitEthernet; 8 Gb/s Fibre Channel
  • 5.
  • 6.
    Up to 285TB usable capacity with shelves
  • 7.
    2 TB or1 TB 7.2K rpm SATA HDD in shelf
  • 8.
  • 9.
  • 10.
    N+1 fans andredundant, hot-plug power suppliesData Deduplication: Technology OverviewStore more backups in a smaller footprintThurs IncrementalACKSecond Friday Full BackupFriday Full BackupMon IncrementalABHBCDEFLGHABCDAEFGTues IncrementalWeds IncrementalCBIEGJABCDEFG Backup Estimated Data Logical Reduction PhysicalFRIDAY FULL 1 TB 2–4x 250 GBMonday Incremental 100 GB 7–10x 10 GBTuesday Incremental 100 GB 7–10x 10 GBWednesday Incremental 100 GB 7–10x 10 GBThursday Incremental 100 GB 7–10x 10 GBSecond FRIDAY FULL 1 TB 50–60x 18 GBTOTAL 2.4 TB 7.8x 308 GBHIJKL
  • 11.
    Retain: Store Morefor Longer with LessOver one year of retention in 3U of Data Domain deduplication storageBackup Cumulative Estimated PhysicalData Logical ReductionFirst Full 1 TB 4x 250 GBWeek 1April 7 2.4 TB 8x 308 GBWeek 2April 14 3.8 TB 10x 366 GBWeek 3April 21 5.2 TB 12x 424 GBMonth 1April 28 6.6 TB 14x 482 GBMonth 2May 31 12.2 TB 17x 714 GBMonth 3June 30 17.8 TB 19x 946 GBMonth 4July 31 23.4 TB 20x 1,178 GBTOTAL 23.4 TB 20x 1,178 GB
  • 12.
    Data Integrity: DataInvulnerability ArchitectureGenerateChecksumVerifyDataRe-Checksum and CompareVerify the file system metadata integrityFile SystemDeduplicationVerify user data integrityLocal CompressionRAIDVerify stripe integrityEnd-to-end data verificationChecksumDeduplication, write to diskVerifySelf-healing file systemCleaningExpired dataDefragVerifyOtherRAID 6NVRAMSnapshotsEnd-to-end data verification
  • 13.
    Network-Efficient Replication forTrue Disaster RecoveryLowers WAN costs; improves service level agreementsWANHomeHomeFlexible replicationOne-to-many
  • 14.
  • 15.
  • 16.

Editor's Notes

  • #6 Another important differentiator for Data Domain systems is the Data Invulnerability Architecture. Data Domain Data Invulnerability Architecture lays out the industry's best defense against data integrity issues by providing unprecedented levels of data protection, data verification, and self-healing capabilities that are unavailable in conventional disk or tape systems.There are three key areas of data integrity protection described on this slide:First is end-to-end data verification at backup time. As illustrated by the graphic at the right, end-to-end verification means reading data after it is written and comparing it to what was sent to disk, proving that it is reachable through the file system to disk and that the data is not corrupted. Specifically, when the Data Domain Operating System receives a write request from backup software, it computes a checksum over the data. After analyzing the data for redundancy, it stores the new data segments and all of the checksums. After all the data has been written to disk, Data Domain Operating System verifies that it can read the entire file from the disk platter and through the Data Domain file system, and that the checksums of the data read back match the checksums of the written data. This confirms the data is correct and recoverable from every level of the system. If there are problems anywhere along the way—for example, if a bit has flipped on a disk drive—it will be caught. Since most restores happen within a day or two of backups, systems that verify/correct data integrity slowly over time will be too late for most recoveries.Second is a self-healing file system. Data Domain systems actively re-verify the integrity of all data every week in an ongoing background process. This scrub process will find and repair defects on the disk before they can become a problem. In addition, real-time error detection ensures that all data returned to the user during a restore is correct. On every read from disk, the system first verifies that the block read from disk is the block expected. It then uses the checksum to verify the integrity of the data. If any issue is found, the Data Domain Operating System will self-heal and correct the data error. In addition to data verification and self-healing, there are a collection of other capabilities. Data Domain with RAID 6 provides double disk failure protection; NVRAM enables fast, safe restart; and snapshots provide point-in-time file system recoverability.Backups are the data store of last resort. Data Domain Data Invulnerability Architecture provides extra levels of data integrity protection to detect faults and repair them to ensure backup data or recovery is not at risk.
  • #9 In addition to DD Boost, EMC offers four additional Data Domain software options that can enhance the value of a Data Domain system in your environment. Note to Presenter: Click now in Slide Show mode for animation.The first is DD Virtual Tape Library software, which eliminates tape-related failures by enabling all Data Domain systems to emulate multiple tape devices over a Fibre Channel interface. This software option provides easy integration of deduplication storage in open systems and IBM i environments. Note to Presenter: Click now in Slide Show mode for animation.Next is DD Replicator software, which provides fast, network-efficient , encrypted replication for disaster recovery, remote office data protection, multi-site tape consolidation, and long-term offsite retention. DD Replicator asynchronously transfers only the compressed, deduplicated data over the WAN, making network-based replication cost-effective, fast, and reliable. In addition, you can replicate up to 270 remote sites into a single Data Domain system for consolidated protection of your distributed enterprise.Note to Presenter: Click now in Slide Show mode for animation.Next, DD Retention Lock software enables you to easily implement deduplication with file locking to satisfy IT governance and compliance policies for archive protection. DD Retention Lock also enables electronic data shredding on a per-file basis to ensure that deleted files have been disposed of in an appropriate and permanent manner, in order to maintain confidentiality of classified material, limit liability, and enforce privacy requirements.Note to Presenter: Click now in Slide Show mode for animation.Finally, DD Encryption software protects backup and archive data stored on Data Domain systems with encryption that is performed inline— before the data is written to disk. Encrypting data at rest satisfies internal governance rules and compliance regulations and protects against theft or loss of a physical system. The combination of inline encryption and deduplication provides the most secure data-at-rest encryption solution available.
  • #10 Like other Data Domain systems, Data Domain Archiver includes a controller and storage shelves, referred to as the “active tier” in this system. The active tier can be expanded to up to four storage shelves (96 TB of usable capacity), and it is used for short-term (generally less than 90 days) retention of backup and archive data. In addition, DD Archiver also incorporates an “archive tier” with up to 23 additional storage shelves (474 TB of usable capacity). Built on a standard Data Domain controller, DD Archiver leverages existing Data Domain technology to enable high throughput of up to 9.8 TB/hr. DD Archiver is cost-optimized for long-term retention of backup and archive data—up to a total of 570 TB usable or 28.5 PB logical capacity (assuming a 50:1 deduplication ratio). In addition, the system offers the unique combination of low cost per gigabyte while still maintaining high throughput. Finally, new fault isolation capabilities ensure long-term recoverability of archive units.All of this leverages existing Data Domain system advantages, including support for network-efficient replication with DD Replicator as well as DD Retention Lock for enforcing file retention. In addition, Data Domain’s Data Invulnerability Architecture ensures data integrity for the life of the system.The combination of high-throughput, cost-optimized storage built on proven Data Domain system technology makes DD Archiver the perfect tape replacement solution.
  • #11 Here’s a look at the latest Data Domain product family, including the recently introduced DD800 series, Data Domain Global Deduplication Array, and Data Domain Archiver (the system for long-term retention of backup and archive data).
  • #17 OPTIONAL SLIDEEMC Global Services are a large component of the your total EMC experience. EMC Global Services allows you to…Save money by:Significantly lowering your implementation and operating expenditure costsFilling internal resource gaps for less Protecting your investments in EMC solutionsAccelerate time to value by:Reducing deployment timeAccelerating return on investment for new projectsEasing the burden of compliance while protecting critical business informationMitigate risk and get better results by:Configuring the solution to meet your requirementsImproving your service levels and reducing your management costsUsing EMC best practices and unmatched product expertise = superior customer experienceReducing disruption while taking advantage of the features and benefits of the latest EMC products and solutions