Your SlideShare is downloading. ×
  • Like
White Paper: EMC VNXe File Deduplication and Compression
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

White Paper: EMC VNXe File Deduplication and Compression

  • 2,339 views
Published

This White Paper describes EMC VNXe file deduplication and compression, a VNXe system feature that increases the efficiency with which network-attached storage (NAS) data is stored. …

This White Paper describes EMC VNXe file deduplication and compression, a VNXe system feature that increases the efficiency with which network-attached storage (NAS) data is stored.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,339
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
42
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. White PaperEMC VNXe FILE DEDUPLICATION ANDCOMPRESSIONOverview Abstract This white paper describes EMC® VNXe™ File Deduplication and Compression, a VNXe system feature that increases the efficiency with which network-attached storage (NAS) data is stored. This paper outlines the advantages of increased storage efficiency. It also explains how File Deduplication and Compression work, the principles and technical architecture behind it, and how it interacts with other VNXe features and functions. October 2011
  • 2. Copyright © 2011 EMC Corporation. All Rights Reserved.EMC believes the information in this publication is accurate asof its publication date. The information is subject to changewithout notice.The information in this publication is provided “as is.” EMCCorporation makes no representations or warranties of any kindwith respect to the information in this publication, andspecifically disclaims implied warranties of merchantability orfitness for a particular purpose.Use, copying, and distribution of any EMC software described inthis publication requires an applicable software license.For the most up-to-date listing of EMC product names, see EMCCorporation Trademarks on EMC.com.VMware, VMware vCenter, and VMware View are registeredtrademarks or trademarks of VMware, Inc. in the United Statesand/or other jurisdictions. All other trademarks used herein arethe property of their respective owners.Part Number h8966 EMC VNXe’s Data Deduplication and Compression – Overview 2
  • 3. Table of ContentsExecutive summary ....................................................................................... 4 Audience............................................................................................................................ 4Terminology ................................................................................................. 4Detailed Overview ......................................................................................... 5 Data reduction technology choices..................................................................................... 5 Minimizing client impact .................................................................................................... 6 Targeting Files ................................................................................................................ 6 CPU Throttling ................................................................................................................ 7 CIFS Compression .......................................................................................................... 7 Access Time ....................................................................................................................... 9 Length of time in days the file has not been accessed ........................................................ 9 The space reduction process............................................................................................ 10Management .............................................................................................. 13 Enabling/Disabling Deduplication and Compression........................................................ 13 Deduplication Settings..................................................................................................... 16 Deduplication Status and Statistics.................................................................................. 16Client input/output to space-reduced files....................................................... 17 Read access ..................................................................................................................... 17 Write access..................................................................................................................... 18Interoperability with other VNXe features ........................................................ 18 Network and VNXe NDMP PAX (tar and dump) .................................................................. 18 Point-in-time views of the storage .................................................................................... 19 Replication....................................................................................................................... 19 VNXe File-Level Retention................................................................................................. 20 File Deduplication and Compression in a VMware Environment........................................ 20Deployment considerations .......................................................................... 20Conclusion ................................................................................................ 21References................................................................................................. 21 EMC VNXe’s Data Deduplication and Compression – Overview 3
  • 4. Executive summaryAs storage usage increases exponentially, improving storage efficiency is critical formany customers today. There are many viable solutions to achieve this objective. Thesolution you implement is based on what you believe is a suitable or effectiveapproach to achieve storage efficiency in your production environment. If you arelooking to reduce the cost of storing data at the file-level without affecting end-userexperience, you should consider the EMC® VNXe™ File Deduplication andCompression feature.The File Deduplication and Compression feature increases file storage efficiency byeliminating redundant data from the files stored in Shared-Folder or VMware NFS-Datastore storage, thereby saving storage space and cost. For each, file-leveldeduplication gives the Storage Processor (SP) the ability to process files in order tocompress them, and the ability to share the same instance of the data when the filesare identical. This feature is applied to Shared Folder and VMware NFS datastorestorage, and is transparent to access protocols.This whitepaper introduces File Deduplication and Compression’s product design,which offers ease of management and intelligent storage efficiency. This paperillustrates the concepts of File Deduplication and Compression and shows to managededuplication on VNXe. The feature is designed to be a simple but powerful additionto an already strong portfolio of VNXe features. Administrators can deploy this featurein existing VNXe storage environments. With this in mind, the paper discusses theinteroperability between File Deduplication and Compression, and other VNXefeatures.AudienceThis white paper is intended for use by EMC field personnel and customers who arefamiliar with EMC VNXe technology, but are not familiar with VNXe File Deduplicationand Compression.Terminology• Common Internet File System (CIFS) – An access protocol that allows users to access files and folders from Windows hosts located on a network. User authentication is maintained through Active Directory, and file access is determined by directory access controls.• Deduplication – The process used to compress redundant data, allowing space to be saved on a storage resource. When multiple files have identical data, the storage resource stores only one copy of the data, and shares that data between the multiple files. EMC VNXe’s Data Deduplication and Compression – Overview 4
  • 5. • Network File System (NFS) – An access protocol that allows users to access files and folders from Linux/UNIX hosts located on a network.• Network Data Management Protocol (NDMP) –A protocol that provides a standard for backing up file servers on a network. NDMP allows centralized applications to back up file servers running on various platforms and platform versions. NDMP reduces network congestion by isolating control path traffic from data path traffic, which permits centrally managed and monitored local backup operations.• Portable Archive Interchange (PAX) – An AIEEE/Posix standard archive protocol that works with standard UNIX tape formats and provides file-level backup and recovery operations.• Share – A named, mountable instance of shared-folder storage, accessible through a shared folder or VMware NFS datastore. Each share is accessible through the protocol (NFS or CIFS) defined for the shared folder where it resides.• Shared folder – A VNXe storage resource that provides access to individual file systems for sharing files and folders. Shared folders contain either Windows shares (which transfer data according to the CIFS protocol) or NFS shares (which transfer data according to the NFS protocol). NFS shares are sometimes referred to as NFS exports.• Snapshot – A read-only, point-in-time copy of data stored on the storage system. You can recover files from snapshots or restore a storage element to a snapshot.• Storage processor (SP) – A hardware component that provides the processing resources for performing storage operations such as creating, managing, and monitoring storage resources.• Unisphere™ – A web-based management environment for creating storage resources, configuring and scheduling protection for stored data, and managing and monitoring other storage operations on VNXe systems.Detailed OverviewData reduction technology choicesThere are a number of technologies classified under data reduction or deduplication.Two major data reduction technologies utilized in the VNXe are:• File-level deduplication – This provides relatively modest space savings. It does not require substantial CPU and memory resources to implement. EMC VNXe’s Data Deduplication and Compression – Overview 5
  • 6. • Compression – This is often regarded as a separate mechanism from deduplication. However, compression can also be viewed as variable, bit-level, intra-object deduplication. Compression alters the way data is stored, mainly to improve storage efficiency. In fact, it offers, by far, the greatest space savings of all the techniques listed for typical NAS data, and is relatively modest in terms of its resource footprint. It is relatively CPU-intensive but requires very little memory.File Deduplication and Compression combines file-level deduplication and data compressiontechnologies to provide a maximum benefit for the resources that are required to providestorage efficiency, while minimizing the client impact on mission-critical files. This featureprocesses file data, not metadata. If multiple files contain the same data but different names,the files are deduplicated. Deduplicated files can also have different permissions andtimestamps.For example, if there are 50 unique files that can be deduplicated in Shared Folder storage, 50unique files will still exist but the data will be compressed, yielding space savings. If there are70 identical copies of a presentation document in the storage resource, 70 files will still existbut they will all share the same file data. In this example, the data usage will decrease by afactor of almost 70. In addition, the one instance of the file data shared by the 70 files will alsobe compressed, providing further space savings.Minimizing client impactVNXe performs all deduplication processing as a background asynchronous operation that actson file data after it is written into the primary storage. It does not process data while file data isbeing written into the storage resource. This avoids latency in the client side, because access toproduction data is sensitive to latency.In addition to processing data in the background, File Deduplication and Compression avoidsprocessing the hot data in storage. The hot data is any file that is in active use by clients. Notethat hot data is defined by how recently clients accessed or modified the files. By notprocessing active files, you avoid introducing any performance penalty on the data that clientsand users are using to run their business. Typically only a small proportion of the data inproduction storage is in active use. This means that VNXe File Deduplication and Compressionprocesses the bulk of the data in storage without affecting the production workload.The policy engine uses a pre-defined default policy to scan the files in a deduplication-enabledstorage resource. This is based on analysis of how typical files age from active use to disuse invarious company settings of different industry and sector types.Targeting FilesThe File Deduplication and Compression feature targets older files and avoids new files that areconsidered active. Therefore, this feature scans each storage resource that is marked to beprocessed no more than once a day. Administrators can prompt the system to scan a specificstorage resource immediately, if required. EMC VNXe’s Data Deduplication and Compression – Overview 6
  • 7. File size is also a criterion to decide whether a file is a candidate for deduplication. FileDeduplication and Compression scans a file for its minimum and maximum size to determine ifit should be deduplicated.Administrators can define filters for VNXe to avoid processing file based on the file extension.Administrators can also define filters based on pathname to restrict directories from beingscanned.CPU ThrottlingThe impact of the deduplication processing on VNXe is controlled through an automatedscheduling mechanism; this mechanism is self-regulating based on the CPU load. Each SPscans and deduplicates only one storage resource at a time. While scanning and deduplicatingfiles, if VNXe detects that the CPU load of the SP exceeds a system-defined threshold, theprocess throttles its activity to a minimal level until it detects that the CPU load has decreasedto less than a low-activity threshold. This means that the deduplication and reduplicationprocesses effectively consume CPU cycles that would otherwise be idle, and does not affect thesystem’s ability to satisfy client activity.CIFS CompressionThe CIFS compression attribute feature allows Windows CIFS clients to manually deduplicateand reduplicate individual files or directories that do not meet the defined policy. WindowsExplorer shows the names of files marked in a different color (the default is blue). The systemattempts to preserve a deduplicated file in the deduplicated state even when it is written to. Assoon as deduplication is disabled, CIFS compression is no longer allowed until theadministrator enables or suspends deduplication.The VNXe system shows deduplicated files as being compressed to the CIFS client. As shown inFigure 1, CIFS clients can deduplicate and reduplicate individual files and directories via theWindows Explorer advanced attribute, Compress contents to save disk space. To do this, right-click a non-deduplicated file, and select Properties > Advanced > Compress contents to savedisk space. Marking a directory only affects new files that are added to the directory. It doesnot affect existing files in the directory. EMC VNXe’s Data Deduplication and Compression – Overview 7
  • 8. Figure 1 Manually compression files EMC VNXe’s Data Deduplication and Compression – Overview 8
  • 9. A full list of the VNXe deduplication settings can be seen shown in Table 1. Table 1 VNXe File Deduplication and Compression Policy Criteria Setting Definition Default Value 30 days Access Length of time in days the file Time has not been accessedModification Time Length of time in days the file has not been modified 30 daysMinimum Size Files below this size will not be deduplicated 24 KBMaximum Size Files greater than this size will not be deduplicated 8 TBDetection Method Hash used to detect duplicate files SHA1Case sensitive CIFS environment compatibility DisabledFile Extension Exclude List* Files with the following extensions will not be No extensions deduplicatedPathname Exclude List* Files with the following pathname will not be No pathnames deduplicatedMinimum Scan Interval Frequency with which the deduplication policy 7 days engine will scan a deduplication-enabled storage resourceProtection Storage Threshold At what usage capacity percentage of the storage 90% resource’s Protection Storage will the space reduction process not proceedHigh CPU Throttle If the CPU reaches this level, deduplication will 75% throttle downLow CPU Throttle If deduplication is throttled down and the CPU level 40% returns to this level, deduplication will throttle back upCIFS Compression Attribute Enables CIFS deduplication EnabledBackup Data Threshold At what percentage of the logical size of the file 90% should the space-reduced size be in order for NDMP to back up the file in its space-reduced format * Settings are configurable in Unisphere. EMC VNXe’s Data Deduplication and Compression – Overview 9
  • 10. The space reduction processFile Deduplication and Compression has a policy engine that specifies data for exclusion fromprocessing on Shared Folder or VMware NFS Datastore storage and decides whether todeduplicate specific files based on the policy criteria. Figure 2 depicts a typical Shared Folderstorage. It contains files with different attributes (i.e. last modified date, file size, and type). Figure 2 Shared Folder storage contains files with different attribute sWhen File Deduplication and Compression is enabled, it periodically scans the primary storagefor files that match the policy criteria (shown in Figure 3). The hidden store is dynamicallyallocated to hold deduplicated data and the hash table used for single instancing. Figure 3 Scanning a file against the policy criteria EMC VNXe’s Data Deduplication and Compression – Overview 10
  • 11. If the file meets the policy criteria, it is copied to the hidden store for compression and it ishashed. The hash is used to compare with other hashes in the hash table to determine if thereis another copy the policy engine has processed before. Since it is a unique hash, the hash isstored in the hash table for later comparisons (shown in Figure 4). Figure 4 File meets policy criteria and gets processe dThe space that the file data occupied in the user portion of the primary storage is freed and thefile’s internal metadata is updated to reference the copy of the data in the hidden store (shownin Figure 5). Figure 5 Stub file points to the file data in the hidden store EMC VNXe’s Data Deduplication and Compression – Overview 11
  • 12. If the data associated with the file has been identified before, the space it occupies is freed andthe internal file metadata is updated to reference the instance already in the hidden store(shown in Figure 6). Figure 6 A duplicate file is detecte dNote that the VNXe storage system detects non-compressible files and stores them in theiroriginal form. However, these files can still benefit from file-level deduplication. EMC VNXe’s Data Deduplication and Compression – Overview 12
  • 13. ManagementYou can manage the File Deduplication and Compression feature through the VNXe UnisphereManager graphical user interface (GUI) or command line interface (CLI).Enabling/Disabling Deduplication and CompressionFile Deduplication and Compression can be enabled when provisioning Shared Folder orVMware (NFS Datastore) storage. To enable this feature for Shared Folder storage, select theEnable Deduplication and Compression checkbox in the Configure Shared Folder Attributeswizard pane during storage creation (see Figure 7). Figure 7 Enable File Deduplication and Compression for Share d Folder storageTo enable this feature for VMware NFS Datastore storage, select the Enable Deduplication andCompression checkbox in the Specify File System Type wizard pane during storage creation(see Figure 8). EMC VNXe’s Data Deduplication and Compression – Overview 13
  • 14. Figure 8 Enabling feature whe n provisioning storageAlternatively, this feature can be enabled after the storage resource has been created bychoosing the storage resource and selecting Details > Deduplication tab > Enable (shown inFigure 9). Figure 9 Enabling feature on existing storage EMC VNXe’s Data Deduplication and Compression – Overview 14
  • 15. Enabling this feature prompts an immediate scan of the storage resource. However, if anotherscan is already in progress on the storage server, it will not be initiated. Both Enable andEnable in the Suspend mode options allows Windows CIFS clients to make use of the WindowsExplorer advanced attribute, "Compress content to save disk space".After you enable a storage resource for deduplication, File Deduplication and Compressionscans it periodically and looks for more files to deduplicate. You can view the state of thededuplication process in the storage resource’s Deduplication tab.The default File Deduplication and Compression state for a storage resource is ”Disabled”.When a storage resource is in the “Disabled” state, it has no deduplicated files and the policyengine does not scan for files to deduplicate.The “Enabled” state indicates that the File Deduplication and Compression processing isenabled for the storage resource. When a storage resource is in the “Enabled” state, it maycontain deduplicated files, and the policy engine scans the source resource for more files todeduplicate on its next scheduled run.The “Suspended” state means that the File Deduplication and Compression processing issuspended for the storage resource. When a source resource is in the “Suspended” state, itmay contain deduplicated files. However, the policy engine does not scan to look for any morefiles to deduplicate.Administrators can switch between deduplication states at any time. If they switch thededuplication state from “Enabled” or “Suspended” to “Disabled”, it prompts the system toreduplicate all deduplicated files in the source resource (shown in Figure 10). The systemchecks whether there is sufficient space available in the source resource to complete thisprocess (without filling the source resource completely) before accepting the request to turndeduplication “Disabled”. Figure 10 Message to reduplicate all de duplicated filesUsers can initiate a deduplicate scan manually by selecting the “Force Rescan Now” button. EMC VNXe’s Data Deduplication and Compression – Overview 15
  • 16. If the space is not sufficient, the system informs the administrator about the amount ofadditional space that is required to complete the operation, and recommends the extension ofthe storage resource.Deduplication SettingsIn Unisphere, policy settings are configured at the storage resource level. The Deduplication tabis found in the Details page of the storage resource. There are two policy criteria that areconfigurable by the administrator: • Excluded File Extensions - If certain file extensions need to be excluded from the deduplication process, set up the file extension exclusion list before running deduplication for the first time. If you add exclusions after deduplication has been run on a storage resource, the new exclusions take effect from that point forward and do not affect files that were deduplicated in previous scans. File extensions must include the leading dot. • Excluded Paths - If you want to exclude pathnames from the deduplication process, add the pathname exclusion list before running deduplication for the first time. If exclusions are added after deduplication has been initiated on a storage resource, these exclusions take effect from that point forward. Files that were deduplicated in previous scans are not affected. The forward slash (/) is a valid directory delimiter within a single pathname.Deduplication Status and StatisticsAs shown in Figure 11, VNXe displays the results of the deduplication process on the data inthe source resource with the following statistics:• State - Enabled or disabled.• Status – Running, In Progress, or Idle.• Last Scan - When the last successful scan of the storage resource was completed.• Files deduplicated - Number of files processed by the deduplication policy engine to save space. It also shows the percentage of deduplicated files.• Saved – Amount (before and after), and percentage of space saved by deduplication. EMC VNXe’s Data Deduplication and Compression – Overview 16
  • 17. Figure 11 Status of Deduplication ScanWhen scanning a storage resource enabled for deduplication the first time, the statistics shownare reported in real time. After the first scan, statistics are reported as static values based onthe last successful scan.Client input/output to space-reduced filesThe File Deduplication and Compression feature does not affect the client input/output (I/O) tofiles that have not been deduplicated. The feature does not introduce any additional overheadfor access to files that it has not processed. The default policy is designed to filter out files thathave frequent I/O access and thus avoid adversely affecting the time required to access thosefiles.Read accessRead access to deduplicated files is satisfied by decompressing the data in memory andpassing it back to the client. VNXe does not decompress or alter any data on disk in responseto client-read activity. In addition, random reads require decompression of the requestedportion of the file data and not of the entire file data. Reading a file that is not deduplicated cantake longer than reading a deduplicated file, because of the decompression activity. However,the opposite may also be true. Reading a deduplicated file is sometimes faster than reading afile that is not deduplicated, because less data needs to be read from the disk, which morethan offsets the increased CPU activity associated with decompressing the data. EMC VNXe’s Data Deduplication and Compression – Overview 17
  • 18. Write accessA client request to write to or modify a deduplicated file causes the requested portion of the fileto reduplicate (decompress) in the storage resource. A write to or a modification of adeduplicated file causes a copy of the requested portion of the file data to be reduplicated forthis particular instance while preserving the deduplicated data for the remaining references tothis file. The following three factors mitigate this effect:• Most applications do not modify files. They typically make a local copy, modify it, and when finished, write the entire new file back to the file server, discarding the old copy in the process. Therefore, reduplication on the file server does not occur; it is just replaced.• VNXe avoids processing active files (accessed or modified recently) based on policy definitions. Hence, deduplicated files are less likely to be modified and, if they are, performance is less likely to be a critical factor.• When a client writes to a deduplicated file, the SP writes only the individual blocks of text that have changed. The entire file is not decompressed and reduplicated on disk until the number of individual changed blocks plus the number of blocks in the corresponding deduplicated file is larger than the logical file size.Interoperability with other VNXe featuresNetwork and VNXe NDMP PAX (tar and dump)When backing up over the network through CIFS or NFS, space-reduced files are filtered for theircompressibility. This determines if the space-reduced files are to be backed up in theircompressed format or in their original form. This is done so as not to affect the restore time onfiles that provide minimal amount of savings. Space-reduced files are not reduplicated to theiroriginal size for transfer to the backup application, and they are not reduplicated on disk. Thespace-saving storage efficiency benefits realized in the VNXe deduplicated production storagedo not flow through to backups when using network-based backups. When backing updeduplication-enabled storage with NDMP PAX, the compressed version of the file is backed upand all instances of the deduplicated file appear in the backup stream. It is worth noting that inmost cases the majority of space savings come from compression, which means that themajority of the space savings made in storage resource apply to the backup as well.In NDMP PAX backups, a threshold is used to check on the compressibility of each space-reduced file before sending the file data compressed. There are cases when it is better to backup the file uncompressed, especially when there is sparsely-written data associated with thefile. By default, the deduplicated size of the file must be 90 percent (or less) than the originalsize of the file at the time it was deduplicated.This ensures that compressible files are backed up in their compressed format, and non-compressible files are backed up in their un-deduplicated format. (The small amount of spacethat would be saved on tape does not justify the extra amount it would take to restore the filesfrom backup.) EMC VNXe’s Data Deduplication and Compression – Overview 18
  • 19. When restoring files from a PAX-based NDMP backup, the files are restored in their compressedformat. This feature also provides right-sizing capabilities because an administrator can backup the deduplicated files from one storage resource and restore them onto a smaller storageresource. This feature is independent of the backup applications.Point-in-time views of the storageThe deduplication process releases space in the production storage immediately. However,blocks may be copied to the protection storage in the process. Deduplicating data associatedwith a file involves copying the data within the storage resource to the hidden and reduceddata store, so that file-level deduplication and compression can be applied to it.Because snapshots use a “copy on first modify” principle, the blocks that get deduplicated arecopied to the protection storage first. This is done to preserves the previous snapshot’s point-in-time view of the storage resource. These blocks are freed when the corresponding snapshotgets deleted or refreshed, and are available for reuse by other snapshots. It is difficult topredict the number of blocks that will be copied to the protection storage during deduplication,because this depends on how full the primary storage is and how quickly it is changing.By default, VNXe is configured to abort deduplication operations on a storage resource before itcauses the protection storage to extend. This avoids the protection storage expanding due todeduplication activity. If the deduplication process is aborted in this way, an alert is generatedthat explains what happened. The VNXe administrator can choose to extend the protectionstorage or simply let the deduplication process execute again on its next scheduled run.ReplicationDeduplicating the contents of a storage resource before you replicate it can greatly reduce theamount of data that is sent over the network as part of the initial-baseline copy process. Theamount of data is transferred over the network depends on the timing of replication updatesand when deduplication runs.In all but the most extreme circumstances, replication updates are more frequent thandeduplication scans of a storage resource. This means that new and changed data in thestorage resource is usually replicated in its non-deduplicated form first. Furthermore, anysubsequent deduplication of that data prompts additional replication traffic; this is due to theblock changes that were made to the hidden data store of the storage resource.This is true of any deduplication solution that processes replicated data and updates remotereplicas of the data, because the replication updates occur more frequently than deduplicationscans. The space savings realized by the production storage resource is reflected on thedestination storage resource. EMC VNXe’s Data Deduplication and Compression – Overview 19
  • 20. VNXe File-Level RetentionYou can enable File Deduplication and Compression on enterprise File-Level Retention (FLR-E)storage without compromising on the protection offered to the data that the storage resourcescontain.File Deduplication and Compression in a VMware EnvironmentFile Deduplication and Compression can be used in a VMware environment to improve storageefficiency of the virtual machines (VMs). Active VMs are constantly changing states and aretypically bypassed by VNXe’s deduplication policy engine.VMs that are selected for storage optimization are compressed using the same compressionalgorithm that is used in File Deduplication and Compression. The algorithm compresses theVMDK file (virtual disk) and leaves the other files that make up the VMs intact. When acompressed VMDK is read, only the data blocks containing the actual data are decompressed.Likewise, when data is written to the VMDK, instead of compressing it “on the fly”, data iswritten to a set-aside buffer, which then gets compressed as a background task. In mostsituations, the amount of data read or written is a small percentage of the total amount of datastored.Deployment considerationsThe following need to be considered when deploying File Deduplication and Compression:• File Deduplication and Compression does not deduplicate data across or between storage resources.• With NDMP PAX, backing up a deduplication-enabled storage results in a shorter time window. However, the restore window required increases. File Deduplication and Compression does not process or affect alternate data streams (also known as named attributes) associated with files and directories in the storage resource.• File Deduplication and Compression does not process files less than 24 KB in size. The overhead associated with processing such files negates any space savings achieved.• There is no limit to the size of a storage resource on which File Deduplication and Compression can be enabled. However, the storage resource must have at least 1 MB of free space before deduplication can be enabled. If there is not enough free space, an error message is generated and the server log is updated.• You can enable File Deduplication and Compression on any number of storage resources.• You can enable File Deduplication and Compression on existing Shared-Folder and VMware NFS-datastore storage. This feature cannot be applied to iSCSI storage hosted on the VNXe system. EMC VNXe’s Data Deduplication and Compression – Overview 20
  • 21. ConclusionThe File Deduplication and Compression feature adds to VNXe’s impressive storage efficiencycapabilities by intelligently reducing space usage and providing further storage efficiency. Youcan optimize the use of the existing storage environment for storage resource data through file-level deduplication and compression.A combination of file-level deduplication and compression represents the best technique toprovide maximum benefit for the resources consumed. This feature builds on VNXe’s ease ofuse by providing a single-click option in Unisphere to enable deduplication for each storage.File Deduplication and Compression also supports almost all Unisphere features and works inaccordance with maximum-supported VNXe limits.ReferencesThe following whitepapers can be found on the VNXe page on Powerlink:• EMC VNXe Series Storage Systems – A Detailed Review• EMC VNXe Data Protection – Overview EMC VNXe’s Data Deduplication and Compression – Overview 21