How NetApp Dedupe works?When you sis on a volume, the behaviour of that volume changes, and the change takes place intwo phases:TWO PHASE PROCESS:PHASE 1 -> SIS enabled: Pre-process: Before the block is written to the array: Collecting FingerprintNote: This is true for new blocks, for the existing data blocks that were written before enabling SIS, you need to run the scan on theexisting data and pull those fingerprints into the catalogue.PHASE 2 -> SIS Start : Post-process: After the block is written to the array: Sorting, Comparing anddedupingPHASE 1:The moment SIS is enabled: Every time SIS notices a block write request coming in, the sis process makes a call to DataONTAP to get a copy of the fingerprint for that block so that it can store this fingerprint in itscatalogue file.
Note: This request interrupts the write string and results in a 7% performance penalty for all writesinto any volume with sis enabled.PHASE 2:Now, at some point youll want to dedupe the volume using the sis start command manually/autoor via schedule: SIS goes through the process of comparing fingerprints from the fingerprint databasecatalogue file, validating data, and dedupeing blocks that pass the validation phase.Note: In the end all we are really doing is adjusting some inode metadata to say "hey remember thatdata that used to be here, well it’s over there now."IMPORTANT: Nothing about the basic data structure of the WAFL file system has changed, exceptyou are traversing a different path in the file structure to get to your desired data block. That’s whyNetApp dedupe *usually* has no perceivable impact on read performance - all weve done isredirect some block pointers. Accessing your data might go a little faster, a little slower, or morelikely not change at all - it all depends on the pattern of the file system data structure and thepattern of requests coming from the application.
What is a fingerprint?Fingerprint is a small digital representation of a larger data object. Basically, it is a checksumcharacter generated by WAFL for each BLOCK for the purpose of consistency checking (This generallyinvolves the creation of a hash).Is fingerprint generated by SIS?No. Each time a WAFL block is created, a checksum character is generated for the purpose ofconsistency checking. NetApp Deduplication (SIS) simply "borrows" a copy of this checksum andstores it in a catalogue as fingerprint.What happens during post-process dedupe?A. The fingerprint catalogue is sorted and searched for identical fingerprints.B. When a fingerprint "match" is made, the associated data blocks are retrieved and scanned byte-for-byte.C. Assuming successful validation, the inode pointer metadata of the duplicate block is redirected tothe original block.D. The duplicate block is marked as "Free" and returned to the system, eligible for re-use.When to use QSM vs. VSM on dedupe volumes?Use QSM when you only want to dedupe the destination volume, and use VSM when you want todedupe both the source and destination volumes automatically, and save bandwidth during SMtransfers.Courtesy: Dr. Dedupe, NetApp.Prepared by:firstname.lastname@example.org