1. FAQ on SnapDIff
March, 2016
ashwinwriter@gmail.com
What is SnapDiff?
SnapDIff is NetApp's proprietary Crawler or Indexer. You can simply call it a 'fast indexing
engine'. It basically takes two snapshots called base snapshot and diff snapshot as input and
returns a list of all files that have been added, deleted, modified and renamed between the two
snapshots.
What is SnapDiff API?
SnapDiff’s Application Programming Interface (API) is the way third-party Backup Software
vendors can integrate their Backup Solutions to make use of this software.
For example:
1. IBM - TSM
2. CommVault [IntelliSnap]
3. Bacula
4. Symantec Netbackup
What is the need of SnapDIff ?
Traditionally, Backup of NAS Volumes are done either over the network (NFS/CIFS) via Windows
mount or NFS mount or by using Industry standard NDMP protocol.
In case of traditional file system backups – A file system tree walk is performed to identify the
files/directories that have changed since the last full back up using traditional file system crawler:
A file system crawler is a piece of software that walks down a file system tree and gathers
information about each subdirectory and file within that tree. This is both time and resource [CPU
& MEM] consuming process for host machine.
Even though NDMP is an incremental [either inode file walk or tree walk] snapshot based
backups it can still be painful depending upon the number of small files and depth of the directory
structure [in Millions].
Slow : During NDMP Incremental backup, the information about file attributes have to be sent to
the NDMP Media Agent [DMA]. This is then compared with the Index that the MA had made
during the full backup. The media agent then generates a list of files that need to be backed up
for the incremental backup which is single stream and can be time consuming depending upon
the number of small files changes for that Incremental iteration. This is only an indexing part,
data also needs to be copied out of the snapshots to external storage.
2. Dis-advantage of NDMP backups are:
a. For TERA & Peta bytes of storage - You can only do '9' Incremental for traditional NDMP
Backups and after that you need to once again face the nightmare of long backup windows for
Full NDMP dumps, sometimes running into weeks if not days.
Clustered Data ONTAP 8.3 onwards supports 32 levels of NDMP dump Backups for NetApp.
Level 0 = is a Full Backup.
Level 1 through Level 31 are Incremental Backups.
However, after '31' Incremental NDMP backups, nightmare of running a Full backup will be back.
Whereas, SnapDIff is incremental forever.
b. NDMP Dump is always streamed out of the storage to secondary storage media - Either Disk
or Tape. In other words, even if you have manageable small number of files and changed blocks,
they still need to be copied out to external media.
SnapDiff - Is a fast indexer/crawler, which quickly identifies the file and directory differences
between two Snapshot copies using inode-walk, rather than performing traditional tree walk.
With Intellisnap option - You can choose to keep the Primary Snaps on the NetApp itself, that
means the Snapshot creation job is only '2' second task. Rest of the time is spent in
'SnapDifferecning' for the files/dir inside the snapshot. You can also defer the SnapDIff for later
time and even do the SnapDIff on the Secondary snap that would have been replicated to DR
box using either Mirror or Vault. If that was not enough, NetApp now allows CommVault to do
Live browse - Which enables admins to quickly mount the snapshot volume to access data in
native format on the fly without needing an index.
Is SnapDIff API free to implement or is it licensed?
Only approved [Backup] partners are given accompanying documentation and guidelines on
using SnapDiff APIs. Please see this post for more information.
https://community.netapp.com/t5/Software-Development-Kit-SDK-and-API-
Discussions/snapdiff-API-documentation/td-p/42561
Can NDMP DUMP Backup use 'SnapDIff' API?
No. NDMP DUMP Backups are done via NDMP native UNIX 'dump' engine which is completely
different from the way the way SnapDIff engine works.
Is SnapDiff applicable to both NAS & SAN protocols?
No. SnapDiff is only applicable to NAS [CIFS (SMB)/NFS] environment.
3. What do you mean by 'maxdiff' or 'SnapDiffMax?
It is the default number [256] of file system differences requested in each query.
What is the purpose of 'SnapDiffMax' additional setting on the Media Agent?
SnapDiffMax additional registry setting allows NAS iDA to increase or decrease the number of
files to be indexed. However, before making any changes, it is recommended to contact NetApp
to verify that the file server configurations can support changes in the values of these additional
settings.
How is SnapDIff performance measured?
SnapDiff performance is not measured in throughput [MB/GB/sec], rather in terms of 'number of
files processed per hour', in more technical language it is called 'Diffs/hour'. Please note,
SnapDIff is not a backup tool, it’s an Indexer.
What factors governs the performance of the 'SnapDIff' engine?
Performance is not affected by the size of the NetApp volumes in terms of terabytes or
petabytes. Nor do the sizes of the files catalogued have any effect on performance. However,
SnapDIff performance can get degraded if the source volume contains many millions of files
spread over a “deep” (many levels) directory structure.
Which Ontap version should the filer be for better SnapDiff performance?
Starting with Data ONTAP 8.2 onwards – regardless of FAS physical architecture (Clustered
Data ONTAP or 7-mode) – SnapDiff’s performance has been greatly improved.
Can increasing the maxdiff registry value from default '256' items to '1024' or more affect
filer's performance?
Please note - SnapDiff is a low priority process on a NetApp storage system. Processes that
support file sharing, such as NFS and/or CIFS, as well as replication functions like SnapVault
and SnapMirror all run at a higher priority. If load on a storage system becomes a problem, there
are several ways of minimizing the load, such as limiting the number of volumes selected for
SnapDiff from a single sub client.
What are the requirements for using SnapDiff API with NetApp filer?
SnapDiff API requires: XML over HTTP[s] & Vol Lang to be 'UTF-8'.
The HTTP[S] connection is used only to transmit administrative data between the backup
software and the NetApp filer. The administrative session data includes information such as filer
credentials, snapshot information, and file names and attributes that are generated by the
snapshot differencing process. The HTTP[S] connection is not used to transmit normal file data
that is accessed on the filer by the client through file sharing protocols.
Ensure:
1. Http/https is enabled
2. Vol Language on the Filer is set to - UTF-8 [en_US.UTF-8]
3. create_ucode and convert_ucode flag is turned on - Especially if SnapDiff ZAPI is run on
volumes on with nfs-only compatible names.
4. Account used for running the SnapDIff on the Filer has this role capability - login-http-
admin,api-*
4. How does Backup applications interact with SnapDIff engine?
Backup applications makes use of following SnapDiff Zephyr API [ZAPI], in short XML messages
transmitted over HTTP(S)
snapdiff-iter-start
snapdiff-iter-next
snapdiff-iter-end
Is there a limit on the concurrent SnapDiff session to NetApp controller?
DataONTAP limits the number of concurrent SnapDiff sessions to 16 per controller. When
planning for NAS backups that perform indexing it is recommended to distribute the schedules so
that no more than 16 jobs are running concurrently on a single controller.
SnapDIff starting point can be traced with following lines in CVNasFileScan.log on the
Media Agent server:
3996 f0c 01/29 15:35:19 173477 ManageONTAP::GetVolLanguage Get volume language.
FileServer:[ONTAP7MODE] Volume:[CIFS]
3996 f0c 01/29 15:35:20 173477 SnapDiffStart FileServer:[ONTAP7MODE] Volume:[CIFS]
SnapOne:[SP_2_173476_31_1454081328] SnapTwo:[SP_2_173477_32_1454081692]
Format:[UTF8] Items In Each Iteration [256]
3996 f0c 01/29 15:35:24 173477 SnapDiffStart File-access-protocol support:[true] version
check:[true]
3996 f0c 01/29 15:35:24 173477 SnapDiffStart atime support:[true] version check:[true]
3996 f0c 01/29 15:35:25 173477 SnapDiffStart End-of-diff support: [true]
3996 f0c 01/29 15:35:26 173477 SnapDiffStart Successfully kicked off SnapDiff.
SnapDiff API module scans the Base & Diff snapshot and identifies changed files using:
1. Inode [No]
2. Change-type attribute: [Values could be - file_creation | inode_modification |
file_deletion ]
Steps to capture 'SnapDIff' process from CommVault side.
1. Open Process Manager on the Media Agent Server responsible for IntelliSnap NAS iDA
backups.
2. Go to Logging Tab | Select Module 'CVNasFileScan'
3. Increase the DbgLevel | LogFileSize | LogFileMaxVer to higher number as desired
5. In this example:
1. CIFS share is setup on NetApp NAS Volume [/vol/CIFS].
2. Three Files A.txt, B.txt & C.txt is copied to share.
Content of each file are:
A.txt - 1,2
B.txt - 1,2,3,4
c.txt - 1, 2, 3, 4, 5, 6
Steps to capture SnapDIff process:
1. Run 'Full' Backup of CIFS share /vo/CIFS using IntelliSnap - Example Job ID:173473
2. Open CVNasFileScan using GxTail on the Media Agent and trace the log for Job ID:173473
7. 4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - Changed file: Original Name
[/vol/CIFS/.snapshot/SP_2_173473_29_1454079783/Download/C.txt]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - inode: [103]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - change-type: [file_creation]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - ftype: [1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - links: [1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - fattr: [0x1ff]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - [0]=3 bits: UserId,groupId,sticky
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - [777]=Owner,Group,Other rwx
bits
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - owner: [0]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - group: [0]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - size: [6]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - crtime: [1454079502]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - mtime: [1454079523]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - ctime: [1454079523]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - atime: [1454079785]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - ntacl-inode: [-1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - streamdir-inode: [-1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - dos-bits: [1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - nfsacl-inode: [-1]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - Modified file name
[/vol/CIFS/Download/C.txt]
4968 1be4 01/29 15:03:46 173473 IndexSnapShot_SnapDiff() - File Name sent to Indexing
[/vol/CIFS/Download/C.txt]
As this is the first 'Full' IntelliSnap Job, both Full & Incremental snaps are same
FullSnap: [SP_2_173473_29_1454079783]
IncrSnap: [SP_2_173473_29_1454079783]
1. This time we went back to the CIFS shared directory and made following changes:
2. We left the text File 'A' & 'B' untouched.
3. We modified text File 'C' and added two additional integers - 7 & 8 [Previously it had
:123456]
We ran another SnapDIff Incremental Backup. As we can see below, SnapDiff quickly identified
the changed file in seconds, here in this example - we expect to see 'C' text file only, and there
it is. We do not see other two text files in the SnapDIff changed files as expected.
CVNasFIlescan log shows 'C.txt' being indentified as changed file with change-type
[inode_modification]
3560 1984 01/29 15:11:49 173475 VolFetchSnapDiff() - Started SnapDiff fetch for volume [CIFS]
3560 1984 01/29 15:11:49 173475 VolFetchSnapDiff() - Will Fetch the chaged files in queue
element [0]
3560 1764 01/29 15:11:51 173475 IndexSnapShot_SnapDiff() - Indexing files at queue element
[0] <--This indicates any new files, as there were no new files, the value returned here is 0.
3560 1984 01/29 15:11:59 173475 VolFetchSnapDiff() - Fetched the chaged files in queue
element [0]
3560 1984 01/29 15:11:59 173475 VolFetchSnapDiff() - Will Fetch the chaged files in queue
element [1] <---This indicates, we have one changed file.
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - Changed file: Original Name
[/vol/CIFS/.snapshot/SP_2_173475_30_1454080264/Download/C.txt]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - inode: [103]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - change-type:
[inode_modification]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - ftype: [1]
8. 3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - links: [1]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - fattr: [0x1ff]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - [0]=3 bits: UserId,groupId,sticky
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - [777]=Owner,Group,Other rwx
bits
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - owner: [0]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - group: [0]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - size: [8]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - crtime: [1454079502]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - mtime: [1454080236]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - ctime: [1454080236]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - atime: [1454080236]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - ntacl-inode: [-1]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - streamdir-inode: [-1]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - dos-bits: [1]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - nfsacl-inode: [-1]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - Modified file name
[/vol/CIFS/Download/C.txt]
3560 1764 01/29 15:11:59 173475 IndexSnapShot_SnapDiff() - File Name sent to Indexing
[/vol/CIFS/Download/C.txt]
Incremental SnapDiff:
SnapDiffStart FileServer:[ONTAP7MODE] Volume:[CIFS]
SnapOne:[SP_2_173473_29_1454079783] SnapTwo:[SP_2_173475_30_1454080264]
FullSnap: [SP_2_173473_29_1454079783]
IncrSnap: [SP_2_173475_30_1454080264]
1. This time we went back to the CIFS shared directory and made following changes:
2. Just touched the '3' Files | Opened them and closed them.
3. We basically 'changed' the access time of these Files.
We ran another SnapDiff Incremental Backup. As we can see below, SnapDIff did not see any
changed file, as we only touched the three files:
SnapDiffStart FileServer:[ONTAP7MODE] Volume:[CIFS]
SnapOne:[SP_2_173475_30_1454080264] SnapTwo:[SP_2_173476_31_1454081328]
FullSnap: [SP_2_173473_29_1454079783]
IncrSnap: [SP_2_173476_31_1454081328]
1. This time we went back to the CIFS shared directory:
2. We left the text File 'A' & 'B' untouched.
3. We delete 'C.txt' text file.
We ran another SnapDIff Incremental Backup. As we can see below, SnapDiff quickly identified the
changed file in seconds, here in this example - we expect to see 'C' text file only, and there it is. We
do not see other two text files in the SnapDIff changed files as expected.
CVNasFIlescan log shows 'C.txt' being indentified as changed file with change-type [file_deletion]
3996 f0c 01/29 15:35:27 173477 IndexSnapShot_SnapDiff() - Changed file: Original Name
[/vol/CIFS/.snapshot/SP_2_173476_31_1454081328/Download/C.txt]
3996 f0c 01/29 15:35:27 173477 IndexSnapShot_SnapDiff() - inode: [103]
3996 f0c 01/29 15:35:27 173477 IndexSnapShot_SnapDiff() - change-type: [file_deletion]
10. SnapDiff API XML interaction can be traced via wireshark on the Media Agent, following
screenshots shows SnapDiff APIs in action:
snapdiff-iter-start
snapdiff-iter-next
snapdiff-iter-end
11. SnapDiff API XML interaction captured on the NetApp filer side:
snapdiff-iter-start
snapdiff-iter-next
snapdiff-iter-next
12. snapdiff-iter-end
Troubleshooting SnapDiff failures:
SnapDiff failures are not easy to troubleshoot especially considering the fact that CommVault
makes use of the standard APIs, and all the functionality is basically carried out by SnapDIff on
the NetApp filer itself. However, we can certainly do some investigation from our side in order to
make sure the basic requirements for SnapDiff operations are met [see the FAQ section].
From a broad perspective, there are three components involved that can cause 'SnapDIff'
failures at various API stages such as snapdiff-iter-start, snapdiff-iter-next,snapdiff-iter-end?
Three components are:
1.SnapDiff engine
2.SnapDiff ZAPI
3.Underlying TCP/IP network that enables XML Data exchange over HTTP/HTTPS.
SnapDIff engine: Unless there is identified bug, SnapDIff should norrmally work fine, there are
some bugs identified for SnapDiff are mentioned towards the end of the document.
SnapDIff API: Like CommVault there are other Backup vendors who also uses the standard
SnapDIff APIs just like us, and unless there is some configuration issue, it should be fine.
Network: This is probably the main gray area which causes 90% of the SnapDIff errors.
According to NetApp, some of the main causes are:
1. Misconfigured ISL (interswitch link)
2. Jumbo frames misconfigured (set on some points but not all points in the network path)
Verify the vendor recommendations for MTU and tcp-adjust-mss settings
3. Host side NIC teaming (Such as HP network teaming) misconfigured
13. Audit logs on the NetApp filer side are quite handy for monitoring ZAPI calls:
The Audit logs are located in ‘log’ directory under ETC$ share (filernameETC$log) and are
named ‘auditlog’, ‘auditlog.0’, ‘auditlog.1’, etc. The index number indicates the age of the log; the
bigger the number, the older is the log.
filer>options auditlog.enable on
Some APIs are considered read-only as these calls are only to obtain information from the filer.
By default, APIs considered read-only are not logged to the auditlog.
Logging can be manually enabled using the command:
filer>options auditlog.readonly_api.enable on
A packet trace between the controller and application host server may help diagnose the network
level issue, please see the following KB articles from NetApp on how to run the packet traces:
Please note - SnapDiff API requires converting information retrieved from Filer in to XML and
‘UTF-8’ is the default character encoding for XML, and therefore, it is important that the 'Volume'
against which SnapDiff is run is set to en_US.UTF-8 encoding at the time of Volume creation.
**NetApp does not recommend changing the volume language after the 'files' have been
copied/written**
If you do turn the switches create_ucode and convert_ucode on, later, please ensure:
Enable the create_ucode and convert_ucode flag on the volume and do the following:
1) Create a Snapshot copy
2) On windows: Start | Run and type: 'attrib /s /d' command in the volume which is mounted in
Windows. This will convert all the names to unicode in the active file system.
3) Disable i2p and then re-enable i2p.
14. Recommended KB Articles from NetApp:
How to collect a network trace with pktt in Data ONTAP 7-Mode
https://kb.netapp.com/support/index?page=content&id=1010155&actp=LIST
How to capture packet traces on Data ONTAP 8 Cluster-Mode systems
https://kb.netapp.com/support/index?page=content&id=1011204
NetApp KB articles related to SnapDiff:
API error: Parsing error in results: Extra content at the end of the document
https://kb.netapp.com/support/index?page=content&id=2018206
NetApp BUGs:
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=533349
Degraded Snapdiff ZAPI performance when running on volume with create_ucode and
convert_ucode flag off
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=533349
Snapdiff ZAPI with cifs file-access-protocol fails with "character out of allowed range" xml parse
error for filenames with invalid/reserved unicode characters
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=588627
ONTAP panic when API snapdiff or inodepath takes more than 6 secs to process a large link
count file.
http://mysupport.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=773781
Controller disruption while using snapdiff to run indexing jobs
ashwinwriter@gmail.com
March, 2016