Netapp Deduplication concepts

NetApp Deduplication
 Deduplication refers to the elimination of redundant
data in the storage. In the deduplication process,
duplicate data is deleted, leaving only one copy of
the data to be stored. However, indexing of all data is
still retained should that data ever be required. De-
duplication is able to reduce the required storage
capacity since only the unique data is stored.

NetApp deduplication provides block-level deduplication within the entire flexible
volume. Essentially, deduplication removes duplicate blocks, storing only unique blocks in
the flexible volume, and it creates a small amount of additional metadata in the process
 Notable features of deduplication include
1. It works with a high degree of granularity: that is, at the 4KB block level
2. It operates on the active file system of the flexible volume
3. It is a background process that can be configured to run automatically, be
scheduled, or run manually through the command line interface (CLI),
NetApp Systems Manager
4. It is enabled and managed by using a simple CLI or GUI such as Systems
Manager

HOW DEDU WORKS
 The core enabling technology of deduplication is fingerprints. These are
unique digital signatures for every 4KB data block in the flexible volume.
 When deduplication runs for the first time on a flexible volume with
existing data, it scans the blocks in the flexible volume and creates a
fingerprint database, which contains a sorted list of all fingerprints for used
blocks in the flexible volume. After the fingerprint file is created,
fingerprints are checked for duplicates and if found, first a byte-by-byte
comparison of the blocks is done to make sure that the blocks are indeed
identical. If they are found to be identical, the block’s pointer is updated to
the already existing data block and the duplicate data block is released and
inode is updated.

HOW DEDU WORKS
 when you 'sis' a volume, the behavior of that volume changes and the
changes takes place in two phases-:
 PHASE-1-:
 SIS enabled: Pre-Process: Before the block is written to the array collecting
Fingerprint
 Note-This is for the new blocks, for the existing data blocks that were
written before enabling SIS, we need to run the scan on the existing data
and pull those fingerprints into the catalogue.

Phase-2
 SIS start :Post process -After the block is written to the array sorting,
comparing and deduping.
Phase-1
The moment the SIS is enabled every time SIS notices a block write request
coming in the SIS process makes a call to Dataontap to get a copy of the
fingerprint for that block so that it can store this fingerprint in its catalogue
file.
Note- This request interruptus the write string and results in a 7%
performance penalty for all writes into any volume with SIS enabled.

 Phase-2
Now at some point you want to dedupe the volume using ‘sis start’
command manually or automatic.
SIS goes through the process of comparing fingerprints from the fingerprint
database catalogue file, validating data and deduping blocks that pass the
validation phase.

Important Note
Nothing about the basic data structure of the WAFL file system
has changed except we are traversing a different path in the file
structure to get to your desired data block. That so why NetApp
dedupe usually has no perceivable impact on read performance.
All we have done is redirect some block pointers. Accessing your
data might go a little faster, a little slower or more likely not
change at all. It all depends on the pattern of the file system data
structure and the pattern of request coming from the
application.

What is a Fingerprint?
Fingerprint is a small digital representation of a larger data object.
basically it is a checksum character generated by WAFL for each BLOCK for
the purpose of consistency checking.
Is fingerprint generated by SIS?
No, Each time a WAFL block is created a checksum character is generated for
the purpose of consistency checking. NetApp deduplication (SIS) simply
borrows a copy of this checksum and stores it in a catalogue as fingerprint.

What happens during post process
deduplication?
 The fingerprint catalog is sorted and searched for identical
fingerprints.
 When a fingerprint match is made the associated data blocks
are retrieved and scanned byte by byte.
 Assuming successful validation the inode pointer metadata of
the duplicate block is redirected to the original block.
 The duplicate block is marked as “Free” and returned to the
system eligible for re-use.

Volume or data constituent & Aggregate
deduplication overhead
 Each volume with deduplication enabled, up to 4% of the physical
amount of data written to that volume is required in order to store
volume deduplication metadata
&
 Each aggregate that contains any volumes with deduplication enabled,
up to 3% of the physical amount of data contained in all of those
volumes with deduplication enabled within the aggregate is required in
order to store the aggregate deduplication metadata.

Thin Provisioning
Definition-:A thin-provisioned volume is a volume for which storage is not
set aside up-front. Instead, the storage for the volume is allocated as it is
needed.
The storage architecture uses aggregates to virtualize the physical storage
into pools for logical allocation. The volumes and LUNs see the logical
space, and the aggregate controls the physical space. This architecture
provides the flexibility to create multiple volumes and LUNs that can exceed
the physical space available in the aggregate. All volumes and LUNs in the
aggregate will use the available storage within the aggregate as a shared
storage pool. This will allow them to efficiently allocate the space available in
the aggregate as data is written to it, rather than preallocating (reserving)
the space called Thin Provisioning.

Thick Provisioning
 Definition-: In virtual storage, thick provisioning is a type of storage
allocation in which the amount of storage capacity on a volume is pre-
allocated on physical storage (aggregate) at the time the volume is
created.

Secure Multi-Tenancy – Definition
 Supporting multiple “tenants” (users, customers, etc.) from single shared
infrastructure while keeping all data isolated and secure
 Customers concerned with security and privacy require secure multi-
tenancy
– Government agencies
– Financial companies
– Service Providers
– Etc.

Multi-Tenancy and Cloud
Infrastructure

Secure Multi-tenancy for virtualized
environments

Secure Multi-tenancy for
virtualized environments
 Solution
 The only validated solution to support end to
end multitenancy across application and data
 Data is securely isolated from virtual server,
network, to virtual storage

Multistore and Vfiler
 A logical partition of N/W and storage resource in Data
ONTAP called multistore and it provides a secure storage
consolidation solution.
 When enabled, the Multistore license creates a logical unit
called vFiler0 which contains all of the storage and network
resources of the physical FAS unit. Additional vFilers can then
be created with storage and network resources assigned
specifically to them.

What is Vfiler ?
 A lightweight Instance of Data ONTAP Multi protocol server and all the
system resource are shared b/w Vfiler units.
 Storage units in the vfilers are Flexvols and Qtrees
 Network Units are IP Address ,VLAN,VIFs,aliases and Ipspaces
 Vfiler units are not hypervisors –vfiler resource cannot be accessed and
discovered by any other vfiler units

Multi store configuration:
 Up to 65 secure partitions (vFiler units) on a single storage system
(64+vfiler0)
 IP Storage based (NFS,CIFS & iSCSI servers)
 Additional storage and n/w resource can be moved, added or deleted
 NFS, CIFS, iSCSI, HTTP, NDMP, FTP, SSH and SFTP protocols are supported
-Protocols can be enabled / disabled per vFiler
-Destroying a vFiler does not destroy data

Multistore-One Physical System, Multiple
Virtual Storage Partitions

What Makes MultiStore Secure?
 MultiStore provides multiple layers of security
– IPspaces
– Administrative separation
– Protocol separation
– Storage separation
 An IPspace has a dedicated routing table
 Each physical interface (Ethernet port) or logical interface (VLAN) is
bound to a single Ipspace

What Makes MultiStore Secure?
 A single IPspace may have multiple physical & logical interfaces bound to
it
 Each customer has a unique Ipspace
 Use of VLANs or VIFs is a best practice with Ipspaces

Always-On Data Mobility
 No planned downtime for-:
– Storage capacity expansion
– Scheduled maintenance outages
–– Software Upgrades

Adding Mobility to Multi-Tenancy

Automated Disaster Recovery DR Site

Netapp Deduplication concepts

More Related Content

What's hot

Viewers also liked

Similar to Netapp Deduplication concepts

More from Saroj Sahu

Netapp Deduplication concepts