2. What is Snapshot?
A snapshot is a copy of all or part of the file system at a given point in time.
They can be taken on any level of the file system.
Snapshots is a FS Image file(which is a System image file).
HDFS Snapshots are read-only point-in-time copies of the file system.
Snapshots can be taken on a subtree of the file system or the entire file system.
Some common use cases of snapshots are data backup, protection against user
errors and disaster recovery.
3. About SnapShot
You can take a snapshot of the entire file system, or of any directory once you have made the
directory snapshottable (hdfs dfsadmin -allowSnapshot <path>).
There is no limit on the number of snapshottable directories.
You can keep up to 65,536 snapshots of each directory.
After you have taken the first snapshot, subsequent snapshots consume only as much disk
space as the delta between data already in snapshot and the current state of the live file
system.
Snapshots are instantaneous and do not interfere with other HDFS operations.
Blocks in data nodes are not copied.
The snapshot files record the block list and the file size.
4. Goal for HDFS SnapShot
Read-only snapshots.
Consistent (to be defined below) snapshots with respect to both NN and
DN state.
Sufficient for DR backup purposes.
Sufficient for producing consistent HBase snapshots.
Support maintaining/surfacing of multiple (10s) of separate snapshots.
5. Command in SnapShot
Allow snapshot
sudo -u hdfs hdfs dfsadmin -allowSnapshot /user
Dissallow snapshot
Snapshots must be deleted before this command can successfully execute.
sudo -u hdfs hdfs dfsadmin -disallowSnapshot /user