NetApp Deduplication
 Deduplication refers to the elimination of redundant
data in the storage. In the deduplication process,
duplicate data is deleted, leaving only one copy of
the data to be stored. However, indexing of all data is
still retained should that data ever be required. De-
duplication is able to reduce the required storage
capacity since only the unique data is stored.
NetApp deduplication provides block-level deduplication within the entire flexible
volume. Essentially, deduplication removes duplicate blocks, storing only unique blocks in
the flexible volume, and it creates a small amount of additional metadata in the process
 Notable features of deduplication include
1. It works with a high degree of granularity: that is, at the 4KB block level
2. It operates on the active file system of the flexible volume
3. It is a background process that can be configured to run automatically, be
scheduled, or run manually through the command line interface (CLI),
NetApp Systems Manager
4. It is enabled and managed by using a simple CLI or GUI such as Systems
Manager
HOW DEDU WORKS
 The core enabling technology of deduplication is fingerprints. These are
unique digital signatures for every 4KB data block in the flexible volume.
 When deduplication runs for the first time on a flexible volume with
existing data, it scans the blocks in the flexible volume and creates a
fingerprint database, which contains a sorted list of all fingerprints for used
blocks in the flexible volume. After the fingerprint file is created,
fingerprints are checked for duplicates and if found, first a byte-by-byte
comparison of the blocks is done to make sure that the blocks are indeed
identical. If they are found to be identical, the block’s pointer is updated to
the already existing data block and the duplicate data block is released and
inode is updated.
HOW DEDU WORKS
 when you 'sis' a volume, the behavior of that volume changes and the
changes takes place in two phases-:
 PHASE-1-:
 SIS enabled: Pre-Process: Before the block is written to the array collecting
Fingerprint
 Note-This is for the new blocks, for the existing data blocks that were
written before enabling SIS, we need to run the scan on the existing data
and pull those fingerprints into the catalogue.
Phase-2
 SIS start :Post process -After the block is written to the array sorting,
comparing and deduping.
Phase-1
The moment the SIS is enabled every time SIS notices a block write request
coming in the SIS process makes a call to Dataontap to get a copy of the
fingerprint for that block so that it can store this fingerprint in its catalogue
file.
Note- This request interruptus the write string and results in a 7%
performance penalty for all writes into any volume with SIS enabled.
 Phase-2
Now at some point you want to dedupe the volume using ‘sis start’
command manually or automatic.
SIS goes through the process of comparing fingerprints from the fingerprint
database catalogue file, validating data and deduping blocks that pass the
validation phase.
Important Note
Nothing about the basic data structure of the WAFL file system
has changed except we are traversing a different path in the file
structure to get to your desired data block. That so why NetApp
dedupe usually has no perceivable impact on read performance.
All we have done is redirect some block pointers. Accessing your
data might go a little faster, a little slower or more likely not
change at all. It all depends on the pattern of the file system data
structure and the pattern of request coming from the
application.
What is a Fingerprint?
Fingerprint is a small digital representation of a larger data object.
basically it is a checksum character generated by WAFL for each BLOCK for
the purpose of consistency checking.
Is fingerprint generated by SIS?
No, Each time a WAFL block is created a checksum character is generated for
the purpose of consistency checking. NetApp deduplication (SIS) simply
borrows a copy of this checksum and stores it in a catalogue as fingerprint.
What happens during post process
deduplication?
 The fingerprint catalog is sorted and searched for identical
fingerprints.
 When a fingerprint match is made the associated data blocks
are retrieved and scanned byte by byte.
 Assuming successful validation the inode pointer metadata of
the duplicate block is redirected to the original block.
 The duplicate block is marked as “Free” and returned to the
system eligible for re-use.
Volume or data constituent & Aggregate
deduplication overhead
 Each volume with deduplication enabled, up to 4% of the physical
amount of data written to that volume is required in order to store
volume deduplication metadata
&
 Each aggregate that contains any volumes with deduplication enabled,
up to 3% of the physical amount of data contained in all of those
volumes with deduplication enabled within the aggregate is required in
order to store the aggregate deduplication metadata.
Thin and Thick Provisioning
Thin Provisioning
Definition-:A thin-provisioned volume is a volume for which storage is not
set aside up-front. Instead, the storage for the volume is allocated as it is
needed.
The storage architecture uses aggregates to virtualize the physical storage
into pools for logical allocation. The volumes and LUNs see the logical
space, and the aggregate controls the physical space. This architecture
provides the flexibility to create multiple volumes and LUNs that can exceed
the physical space available in the aggregate. All volumes and LUNs in the
aggregate will use the available storage within the aggregate as a shared
storage pool. This will allow them to efficiently allocate the space available in
the aggregate as data is written to it, rather than preallocating (reserving)
the space called Thin Provisioning.
Thick Provisioning
 Definition-: In virtual storage, thick provisioning is a type of storage
allocation in which the amount of storage capacity on a volume is pre-
allocated on physical storage (aggregate) at the time the volume is
created.
Multi-Tenancy; What is it?
Secure Multi-Tenancy – Definition
 Supporting multiple “tenants” (users, customers, etc.) from single shared
infrastructure while keeping all data isolated and secure
 Customers concerned with security and privacy require secure multi-
tenancy
– Government agencies
– Financial companies
– Service Providers
– Etc.
Multi-Tenancy and Cloud
Infrastructure
Secure Multi-tenancy for virtualized
environments
Secure Multi-tenancy for
virtualized environments
 Solution
 The only validated solution to support end to
end multitenancy across application and data
 Data is securely isolated from virtual server,
network, to virtual storage
Introducing MultiStore
Multistore and Vfiler
 A logical partition of N/W and storage resource in Data
ONTAP called multistore and it provides a secure storage
consolidation solution.
 When enabled, the Multistore license creates a logical unit
called vFiler0 which contains all of the storage and network
resources of the physical FAS unit. Additional vFilers can then
be created with storage and network resources assigned
specifically to them.
What is Vfiler ?
 A lightweight Instance of Data ONTAP Multi protocol server and all the
system resource are shared b/w Vfiler units.
 Storage units in the vfilers are Flexvols and Qtrees
 Network Units are IP Address ,VLAN,VIFs,aliases and Ipspaces
 Vfiler units are not hypervisors –vfiler resource cannot be accessed and
discovered by any other vfiler units
Multi store configuration:
 Up to 65 secure partitions (vFiler units) on a single storage system
(64+vfiler0)
 IP Storage based (NFS,CIFS & iSCSI servers)
 Additional storage and n/w resource can be moved, added or deleted
 NFS, CIFS, iSCSI, HTTP, NDMP, FTP, SSH and SFTP protocols are supported
-Protocols can be enabled / disabled per vFiler
-Destroying a vFiler does not destroy data
Multistore-One Physical System, Multiple
Virtual Storage Partitions
What Makes MultiStore Secure?
 MultiStore provides multiple layers of security
– IPspaces
– Administrative separation
– Protocol separation
– Storage separation
 An IPspace has a dedicated routing table
 Each physical interface (Ethernet port) or logical interface (VLAN) is
bound to a single Ipspace
What Makes MultiStore Secure?
 A single IPspace may have multiple physical & logical interfaces bound to
it
 Each customer has a unique Ipspace
 Use of VLANs or VIFs is a best practice with Ipspaces
File Services Consolidation
Application Hosting
Always-On Data Mobility
Always-On Data Mobility
 No planned downtime for-:
– Storage capacity expansion
– Scheduled maintenance outages
–– Software Upgrades
Adding Mobility to Multi-Tenancy
Automated Disaster Recovery DR Site

Netapp Deduplication concepts

  • 1.
    NetApp Deduplication  Deduplicationrefers to the elimination of redundant data in the storage. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. De- duplication is able to reduce the required storage capacity since only the unique data is stored.
  • 4.
    NetApp deduplication providesblock-level deduplication within the entire flexible volume. Essentially, deduplication removes duplicate blocks, storing only unique blocks in the flexible volume, and it creates a small amount of additional metadata in the process  Notable features of deduplication include 1. It works with a high degree of granularity: that is, at the 4KB block level 2. It operates on the active file system of the flexible volume 3. It is a background process that can be configured to run automatically, be scheduled, or run manually through the command line interface (CLI), NetApp Systems Manager 4. It is enabled and managed by using a simple CLI or GUI such as Systems Manager
  • 5.
    HOW DEDU WORKS The core enabling technology of deduplication is fingerprints. These are unique digital signatures for every 4KB data block in the flexible volume.  When deduplication runs for the first time on a flexible volume with existing data, it scans the blocks in the flexible volume and creates a fingerprint database, which contains a sorted list of all fingerprints for used blocks in the flexible volume. After the fingerprint file is created, fingerprints are checked for duplicates and if found, first a byte-by-byte comparison of the blocks is done to make sure that the blocks are indeed identical. If they are found to be identical, the block’s pointer is updated to the already existing data block and the duplicate data block is released and inode is updated.
  • 7.
    HOW DEDU WORKS when you 'sis' a volume, the behavior of that volume changes and the changes takes place in two phases-:  PHASE-1-:  SIS enabled: Pre-Process: Before the block is written to the array collecting Fingerprint  Note-This is for the new blocks, for the existing data blocks that were written before enabling SIS, we need to run the scan on the existing data and pull those fingerprints into the catalogue.
  • 9.
    Phase-2  SIS start:Post process -After the block is written to the array sorting, comparing and deduping. Phase-1 The moment the SIS is enabled every time SIS notices a block write request coming in the SIS process makes a call to Dataontap to get a copy of the fingerprint for that block so that it can store this fingerprint in its catalogue file. Note- This request interruptus the write string and results in a 7% performance penalty for all writes into any volume with SIS enabled.
  • 10.
     Phase-2 Now atsome point you want to dedupe the volume using ‘sis start’ command manually or automatic. SIS goes through the process of comparing fingerprints from the fingerprint database catalogue file, validating data and deduping blocks that pass the validation phase.
  • 12.
    Important Note Nothing aboutthe basic data structure of the WAFL file system has changed except we are traversing a different path in the file structure to get to your desired data block. That so why NetApp dedupe usually has no perceivable impact on read performance. All we have done is redirect some block pointers. Accessing your data might go a little faster, a little slower or more likely not change at all. It all depends on the pattern of the file system data structure and the pattern of request coming from the application.
  • 13.
    What is aFingerprint? Fingerprint is a small digital representation of a larger data object. basically it is a checksum character generated by WAFL for each BLOCK for the purpose of consistency checking. Is fingerprint generated by SIS? No, Each time a WAFL block is created a checksum character is generated for the purpose of consistency checking. NetApp deduplication (SIS) simply borrows a copy of this checksum and stores it in a catalogue as fingerprint.
  • 14.
    What happens duringpost process deduplication?  The fingerprint catalog is sorted and searched for identical fingerprints.  When a fingerprint match is made the associated data blocks are retrieved and scanned byte by byte.  Assuming successful validation the inode pointer metadata of the duplicate block is redirected to the original block.  The duplicate block is marked as “Free” and returned to the system eligible for re-use.
  • 15.
    Volume or dataconstituent & Aggregate deduplication overhead  Each volume with deduplication enabled, up to 4% of the physical amount of data written to that volume is required in order to store volume deduplication metadata &  Each aggregate that contains any volumes with deduplication enabled, up to 3% of the physical amount of data contained in all of those volumes with deduplication enabled within the aggregate is required in order to store the aggregate deduplication metadata.
  • 16.
    Thin and ThickProvisioning
  • 17.
    Thin Provisioning Definition-:A thin-provisionedvolume is a volume for which storage is not set aside up-front. Instead, the storage for the volume is allocated as it is needed. The storage architecture uses aggregates to virtualize the physical storage into pools for logical allocation. The volumes and LUNs see the logical space, and the aggregate controls the physical space. This architecture provides the flexibility to create multiple volumes and LUNs that can exceed the physical space available in the aggregate. All volumes and LUNs in the aggregate will use the available storage within the aggregate as a shared storage pool. This will allow them to efficiently allocate the space available in the aggregate as data is written to it, rather than preallocating (reserving) the space called Thin Provisioning.
  • 19.
    Thick Provisioning  Definition-:In virtual storage, thick provisioning is a type of storage allocation in which the amount of storage capacity on a volume is pre- allocated on physical storage (aggregate) at the time the volume is created.
  • 20.
  • 21.
    Secure Multi-Tenancy –Definition  Supporting multiple “tenants” (users, customers, etc.) from single shared infrastructure while keeping all data isolated and secure  Customers concerned with security and privacy require secure multi- tenancy – Government agencies – Financial companies – Service Providers – Etc.
  • 22.
  • 23.
    Secure Multi-tenancy forvirtualized environments
  • 24.
    Secure Multi-tenancy for virtualizedenvironments  Solution  The only validated solution to support end to end multitenancy across application and data  Data is securely isolated from virtual server, network, to virtual storage
  • 25.
  • 26.
    Multistore and Vfiler A logical partition of N/W and storage resource in Data ONTAP called multistore and it provides a secure storage consolidation solution.  When enabled, the Multistore license creates a logical unit called vFiler0 which contains all of the storage and network resources of the physical FAS unit. Additional vFilers can then be created with storage and network resources assigned specifically to them.
  • 27.
    What is Vfiler?  A lightweight Instance of Data ONTAP Multi protocol server and all the system resource are shared b/w Vfiler units.  Storage units in the vfilers are Flexvols and Qtrees  Network Units are IP Address ,VLAN,VIFs,aliases and Ipspaces  Vfiler units are not hypervisors –vfiler resource cannot be accessed and discovered by any other vfiler units
  • 28.
    Multi store configuration: Up to 65 secure partitions (vFiler units) on a single storage system (64+vfiler0)  IP Storage based (NFS,CIFS & iSCSI servers)  Additional storage and n/w resource can be moved, added or deleted  NFS, CIFS, iSCSI, HTTP, NDMP, FTP, SSH and SFTP protocols are supported -Protocols can be enabled / disabled per vFiler -Destroying a vFiler does not destroy data
  • 29.
    Multistore-One Physical System,Multiple Virtual Storage Partitions
  • 30.
    What Makes MultiStoreSecure?  MultiStore provides multiple layers of security – IPspaces – Administrative separation – Protocol separation – Storage separation  An IPspace has a dedicated routing table  Each physical interface (Ethernet port) or logical interface (VLAN) is bound to a single Ipspace
  • 31.
    What Makes MultiStoreSecure?  A single IPspace may have multiple physical & logical interfaces bound to it  Each customer has a unique Ipspace  Use of VLANs or VIFs is a best practice with Ipspaces
  • 33.
  • 34.
  • 35.
  • 36.
    Always-On Data Mobility No planned downtime for-: – Storage capacity expansion – Scheduled maintenance outages –– Software Upgrades
  • 37.
    Adding Mobility toMulti-Tenancy
  • 38.