46. Balancer
HDFS data might not always be be placed uniformly across the DataNode.
46
-policy <policy> datanode (default): Cluster is balanced if each
datanode is balanced.
blockpool: Cluster is balanced if each block pool in
each datanode is balanced.
-threshold <threshold> Percentage of disk capacity.
-exclude -f <hosts-file> | <comma-separated list of
hosts>
Excludes the specified datanodes from being
balanced by the balancer.
-include -f <hosts-file> | <comma-separated list of
hosts>
Includes only the specified datanodes to be
balanced by the balancer.
-source -f <hosts-file> | <comma-separated list of
hosts>
Pick only the specified datanodes as source nodes.
-blockpools <comma-separated list of blockpool
ids>
The balancer will only run on blockpools included
in this list.
-h|--help Display the tool usage and help information and
exit.
48. Cache admin command-line interface
On the command-line, administrators and users can interact with cache pools and
directives via the hdfs cache admin subcommand.
48
add Directive
remove Directive
list Directives
add Pool
modify Pool
remove Pool
list Pools
help
50. crypto command-line interface
path The path of the encryption zone to create. It must be an
empty directory. A trash directory is provisioned under
this path.
keyName Name of the key to use for the encryption zone.
Uppercase key names are unsupported.
50
Create a new encryption zone
listZones:List all encryption zones. Requires superuser permissions.
provisionTrash:Provision a trash directory for an encryption zone.
path The path to the root of the encryption zone.
51. Data node
COMMAND_OPTION Description
-regular Normal datanode startup (default).
-rollback Rollback the datanode to the previous version. This
should be used after stopping the datanode and
distributing the old hadoop version.
-rollingupgrade rollback Rollback a rolling upgrade operation.
51
Runs a HDFS datanode.
53. Haadmin: administer your HA(High
Availability) HDFS cluster.
COMMAND_OPTION Description
-checkHealth check the health of the given
NameNode
-getServiceState determine whether the given
NameNode is Active or Standby
-getAllServiceState returns the state of all the
NameNodes
-transitionToActive transition the state of the given
NameNode to Active.
-transitionToStandby transition the state of the given
NameNode to Standby .
-help [cmd] Displays help for the given
command or all commands if none
is specified. 53
56. dfsadmin
56
-printTopology Print a tree of the racks and their nodes as reported by
the Namenode
-refreshNamenodes datanodehost:port For the given datanode, reloads the configuration files,
stops serving the removed block-pools and starts
serving new block-pools.
-getVolumeReport datanodehost:port For the given datanode, get the volume report.
-deleteBlockPool datanode-host:port blockpoolId
[force]
If force is passed, block pool directory for the given
blockpool id on the given datanode is deleted along
with its contents, otherwise the directory is deleted
only if it is empty. The command will fail if datanode is
still serving the block pool. Refer to refreshNamenodes
to shutdown a block pool service on a datanode.
57. dfsadmin
57
-refreshServiceAcl Reload the service-level authorization policy file.
-refreshUserToGroupsMappings Refresh user-to-groups mappings.
-refreshSuperUserGroupsConfiguration Refresh superuser proxy groups mappings
-refreshCallQueue Reload the call queue from config.
-reconfig <datanode |namenode> <host:ipc_port>
<start|status|properties>
Starts reconfiguration or gets the status of an
ongoing reconfiguration, or gets a list of
reconfigurable properties. The second parameter
specifies the node type.
59. storagepolicies
Hot - for both storage and compute. When a block is hot, all replicas are stored in DISK.
Cold - only for storage with limited compute. When a block is cold, all replicas are stored
in ARCHIVE.
Warm - partially hot and partially cold. When a block is warm, some of its replicas are
stored in DISK and the remaining replicas are stored in ARCHIVE.
All_SSD - for storing all replicas in SSD.
One_SSD - for storing one of the replicas in SSD. The remaining replicas are stored in DISK.
Lazy_Persist - for writing blocks with single replica in memory. The replica is first written
in RAM_DISK and then it is lazily persisted in DISK.
59
60. Storage Policy Commands
60
COMMAND_OPTION Description
hdfs storagepolicies –listPolicies List out all the storage policies.
hdfs storagepolicies -setStoragePolicy -path
<path> -policy <policy>
Set a storage policy to a file or a directory.
hdfs storagepolicies -unsetStoragePolicy -path
<path>
Unset a storage policy to a file or a
directory. After the unset command the
storage policy of the nearest ancestor will
apply, and if there is no policy on any
ancestor then the default storage policy will
apply.
hdfs storagepolicies -getStoragePolicy -path
<path>
Get the storage policy of a file or a
directory.
-path <path> The path referring to either a directory or a file.
-policy <policy> The name of the storage policy.