Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EMC Deduplication Fundamentals

Deduplication reduces the amount of disk storage needed to retain and protect data by ratios of 10-30x and greater, making a disk a cost-effective alternative to tape. Data on disk is available online and onsite for longer retention periods, and restores become fast and reliable. Storing only unique data on disk also means that data can be cost-effectively replicated over existing networks to remote sites for disaster recovery and consolidated tape operations.

  • Be the first to comment

EMC Deduplication Fundamentals

  1. 1. Deduplication Fundamentals<br />
  2. 2. Data Domain Basics<br />Easy integration with existing environment<br />Control Tier<br />Target Tier<br />Disaster Recovery Tier<br />Backup and Archive Applications<br />CIFS, NFS, <br />NDMP, DD Boost<br />Ethernet<br />Virtual Tape <br />Library (VTL) over <br />Fibre Channel<br />EMC<br />Symantec<br />CommVault<br />IBM<br />BakBone Software<br />Vizioncore<br />Replication<br />DD890 appliance<br />DD890 appliance<br /><ul><li>2U
  3. 3. 2 to 10 ports
  4. 4. 10 and 1 GigabitEthernet; 8 Gb/s Fibre Channel
  5. 5. RAID 6
  6. 6. Up to 285 TB usable capacity with shelves
  7. 7. 2 TB or 1 TB 7.2K rpm SATA HDD in shelf
  8. 8. File system
  9. 9. NVRAM
  10. 10. N+1 fans and redundant, hot-plug power supplies</li></li></ul><li>Data Deduplication: Technology Overview<br />Store more backups in a smaller footprint<br />Thurs Incremental<br />A<br />C<br />K<br />Second Friday Full Backup<br />Friday Full Backup<br />Mon Incremental<br />A<br />B<br />H<br />B<br />C<br />D<br />E<br />F<br />L<br />G<br />H<br />A<br />B<br />C<br />D<br />A<br />E<br />F<br />G<br />Tues Incremental<br />Weds Incremental<br />C<br />B<br />I<br />E<br />G<br />J<br />A<br />B<br />C<br />D<br />E<br />F<br />G<br /> Backup Estimated <br /> Data Logical Reduction Physical<br />FRIDAY FULL 1 TB 2–4x 250 GB<br />Monday Incremental 100 GB 7–10x 10 GB<br />Tuesday Incremental 100 GB 7–10x 10 GB<br />Wednesday Incremental 100 GB 7–10x 10 GB<br />Thursday Incremental 100 GB 7–10x 10 GB<br />Second FRIDAY FULL 1 TB 50–60x 18 GB<br />TOTAL 2.4 TB 7.8x 308 GB<br />H<br />I<br />J<br />K<br />L<br />
  11. 11. Retain: Store More for Longer with Less<br />Over one year of retention in 3U of Data Domain deduplication storage<br />Backup Cumulative Estimated Physical<br />Data Logical Reduction<br />First Full 1 TB 4x 250 GB<br />Week 1<br />April 7 2.4 TB 8x 308 GB<br />Week 2<br />April 14 3.8 TB 10x 366 GB<br />Week 3<br />April 21 5.2 TB 12x 424 GB<br />Month 1<br />April 28 6.6 TB 14x 482 GB<br />Month 2<br />May 31 12.2 TB 17x 714 GB<br />Month 3<br />June 30 17.8 TB 19x 946 GB<br />Month 4<br />July 31 23.4 TB 20x 1,178 GB<br />TOTAL 23.4 TB 20x 1,178 GB<br />
  12. 12. Data Integrity: Data Invulnerability Architecture<br />Generate<br />Checksum<br />Verify<br />Data<br />Re-Checksum and Compare<br />Verify the file system metadata integrity<br />File System<br />Deduplication<br />Verify user data integrity<br />Local Compression<br />RAID<br />Verify stripe integrity<br />End-to-end data verification<br />Checksum<br />Deduplication, write to disk<br />Verify<br />Self-healing file system<br />Cleaning<br />Expired data<br />Defrag<br />Verify<br />Other<br />RAID 6<br />NVRAM<br />Snapshots<br />End-to-end data verification<br />
  13. 13. Network-Efficient Replication for True Disaster Recovery<br />Lowers WAN costs; improves service level agreements<br />WAN<br />Home<br />Home<br />Flexible replication<br /><ul><li>One-to-many
  14. 14. Many-to-one
  15. 15. Bi-directional
  16. 16. System-to-system
  17. 17. Cascaded</li></ul>1–5%<br />DB<br />Data Domain system<br />Archive data<br />1–5%<br />Backup data<br />Data Domain system<br />1–5%<br />Data Domain Global Deduplication Array<br />Data Domain system<br />Destination:<br />Data Center Hub <br />Supports hundreds of remote sites<br />Source:<br />Remote sites<br />95–99% cross-site bandwidth reduction<br />
  18. 18. DD Boost Software<br />Distributes parts of deduplication process to backup server or application clients<br />Licensable software works across Data Domain portfolio<br />Supports majority of backup software market<br />EMC Avamar and NetWorker<br />Symantec NetBackup and Backup Exec<br />Speeds backups by up to 50 percent<br />Process more backups with existing resources<br />20–40 percent less overall impact to backup server<br />80–99 percent less LAN bandwidth<br />Enables Data Domain replication management from the backup application<br />DD Boost<br />
  19. 19. Data Domain Replicator<br /><ul><li>Network-efficient and encrypted
  20. 20. Transfers only compressed, deduplicated data over the WAN
  21. 21. Consolidate up to 270 remote sites into a single system </li></ul>Additional Data Domain Software Options <br />Data Domain Virtual Tape Library<br /><ul><li>Easily integrates with Fibre Channel
  22. 22. Emulates multiple tape libraries
  23. 23. Supports open systems and IBM i operating environments</li></ul>Data Domain Encryption<br /><ul><li>Inline encryption of data at rest
  24. 24. Satisfies internal governance rules and compliance regulations
  25. 25. Protects against theft or loss of a physical system</li></ul>Data Domain Retention Lock<br /><ul><li>File locking to satisfy IT governance and compliance policies
  26. 26. Electronic data shredding </li></li></ul><li>DD Archiver Overview<br />Cost-optimized long-term retention<br />Data Domain system for backup and archive<br />Active tier: short-term data protection; less than 90 days<br />Archive tier: scalable long-term retention; multiple years<br />High-throughput deduplication storage<br />Up to 9.8 TB/hr<br />Cost optimized for long-term retention<br />Up to 570 TB usable, 28.5 PB logical capacity<br />Low cost per gigabyte while maintaining high throughput<br />Fault isolation of archive units for long-term recoverability<br />Leverage existing Data Domain system advantages<br />Supports DD Replicator and DD Retention Lock software options<br />Data Domain Data Invulnerability Architecture to ensure data integrity<br />
  27. 27. Industry’s Most Scalable Inline Deduplication Systems<br />DD Archiver<br />Global Deduplication<br /> Array<br />DD800<br />Appliance Series<br />DD600 <br />Appliance Series<br />Software options:<br />DD Boost, DD Virtual Tape Library, DD Replicator, DD Retention Lock, and DD Encryption<br />DD140Remote<br />Office Appliance<br />
  28. 28. Deduplication Storage Evaluation Criteria<br />
  29. 29. Methodology: Inline versus Post-Process Deduplication<br />Deduplication<br />POST- PROCESS<br />Deduplication After Storing<br />INLINE<br />Deduplication Before Storing<br />Deduplication<br />Store<br />3x disk accesses to shared store<br /><ul><li>Other activities unimpeded
  30. 30. Predictable
  31. 31. Simpler
  32. 32. The more processes, the more resource contention
  33. 33. Copy to tape: Too slow to stream tape
  34. 34. Recovery: Service level agreement predictability
  35. 35. Replication: Poor time-to-disaster-recovery
  36. 36. Deduplication: If interleaved with backup or restore
  37. 37. More administrationto fight these issues</li></li></ul><li>Performance: CPU-Centric versus Spindle-Bound<br />Data Domain<br />6,000<br />Fibre Channel<br />SATA<br />Throughput MB/s<br />Most<br />deduplication<br />vendors<br />50<br />50<br />100<br />150<br />200<br />Number of Disk Spindles<br />
  38. 38. Data Domain Systems Trajectory<br />Data Domain SISL Scaling Architecture: CPU-centric<br />5<br />3<br />1.5<br />0.04<br />Improvement since 2004:<br />Throughput: ~175x<br />Capacity: ~450x<br />Dual-controller Global Deduplication Array<br />DD Boost<br />2014 (est.)<br />Single-controller, standard protocols<br />Throughput GB/s<br />DD200 (2004)<br />2004<br />Future<br />2010<br />2011<br />
  39. 39. Why Data Domain?<br />Less disk to resource, less to manage<br />CPU-centric deduplication<br />Inline deduplication<br />Simple, mature, and flexible<br />Simple, mature appliance<br />Any fabric, any software, backup or archive applications<br />Resilience and disaster recovery<br />Storage of last resort<br />Fast time-to-DR readiness<br />Cross-site global compression<br />Data center or remote office<br />
  40. 40. Why EMC Global Services ?<br />Save money <br /><ul><li>Significantly lower implementation and operating expenditures
  41. 41. Fill internal resource gaps for less
  42. 42. Protect investments in EMC solutions</li></ul>Accelerate time to value<br /><ul><li>Reduce deployment time
  43. 43. Accelerate return on investment for new projects
  44. 44. Ease the burden of compliance while protecting critical business information</li></ul>Mitigate risk and get better results<br /><ul><li>Configure the solution to meet your requirements
  45. 45. Improve service levels; reduce management costs
  46. 46. EMC best practices and unmatched product expertise = superior customer experience
  47. 47. Reduce disruption while taking advantage of the features and benefits of the latest EMC products and solutions</li>

    Be the first to comment

    Login to see the comments

  • aminvogue

    Apr. 25, 2013
  • kishih

    Feb. 8, 2015
  • kashanwaseem

    Feb. 17, 2015
  • RanjithReddy91

    Aug. 21, 2017
  • HosseinRasouli2

    Jul. 28, 2019
  • KapilSahdev1

    Mar. 26, 2020

Deduplication reduces the amount of disk storage needed to retain and protect data by ratios of 10-30x and greater, making a disk a cost-effective alternative to tape. Data on disk is available online and onsite for longer retention periods, and restores become fast and reliable. Storing only unique data on disk also means that data can be cost-effectively replicated over existing networks to remote sites for disaster recovery and consolidated tape operations.

Views

Total views

7,298

On Slideshare

0

From embeds

0

Number of embeds

72

Actions

Downloads

418

Shares

0

Comments

0

Likes

6

×