complete construction, environmental and economics information of biomass com...
HDFS Commands Guide for Managing Files and Directories
1. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
HDFS
As we know, HDFS is designed to store very large
amounts of data – terabytes and petabytes.
Reliability is part of the core design of HDFS.
HDFS provides horizontal scalability.
Default block size of HDFS is 64MB.
HDFS does not store files in the traditional filesystem
and requires different commands to access the
metadata.
In order to work with HDFS you need to use hadoop fs
command.
2. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
What HDFS is not good at?
Due to the architecture choices, HDFS is good at
certain workloads.
It also has the following limitations due to these
choices:
• Random seek IO performance is bad. HDFS is optimized
for streaming read performance – leading to long
sequential read performance optimizations.
• It is optimized for write-once, read-many workloads.
• Not ideal for large number of small files.
3. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Configuring HDFS
core-site.xml and hdfs-site.xml contains default values for
every parameter of Hadoop.
These files can be found in /etc/hadoop/conf directory
Configuration settings are a set of key-value pairs
Some of the main properties are
Key Value Example
fs.default.name Protocol://servername:port hdfs://localhost:8020
dfs.replication No of replication 1
dfs.namenode.name.dir Pathname var/lib/hadoophdfs/cache/${u
ser.name}/dfs/name
dfs.datanode.data.dir Pathname var/lib/hadoophdfs/cache/${u
ser.name}/dfs/data
4. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Basic Hadoop FS Commands
Check your home directory
• $ hadoop fs –ls
• Above command will run if home directory is created
else it will give output as:
7. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Basic Hadoop FS Commands
Different ways to create a file in HDFS
• Create a zero byte file using touchz
• Copying a file from local fs
21. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Quota
Name Quota:
• A hard limit on the number of file and directory names in
the tree
• Fails file and directory creations
• A newly created directory has no associated quota
• To set quota
$ hadoop dfsadmin –setQuota <N> <dir>…<dir>
• To remove quota
$ hadoop dfsadmin -clrQuota <dir>...<dir>
• To see the quota of a directory
hadoop fs –count –q <dir>…<dir>
First 4 columns of the output are name quota value, available name
quota, space quota value and available space quota
23. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Quota
Space Quota:
• A hard limit on number of bytes used by files in the tree
• Block allocations fail if the quota would not allow a full
block to be written
• A newly created directory has no associated quota
• The space quota takes replication into consideration
• To set quota
$ hadoop dfsadmin –setSpaceQuota <N> <dir>…<dir>
N is by default in bytes but can be used as 10m, 10g, 10t,etc
• To remove quota
$ hadoop dfsadmin -clrSpaceQuota <dir>...<dir>
25. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Using HDFS Programmatically
Creating configuration object
• To be able to read from or write to HDFS, you need to
create a Configuration object and pass configuration
parameter to it using hadoop configuration files.
Configuration conf = new Configuration();
conf.addResource(new Path("/opt/hadoop-0.20.0/conf/core-
site.xml"));
conf.addResource(new Path("/opt/hadoop-0.20.0/conf/hdfs-
site.xml"));
• If you do not assign the configurations to conf object (using
hadoop xml file) your HDFS operation will be performed on
the local file system and not on the HDFS.
26. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Using HDFS Programatically
Adding Files in HDFS
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/to/file.ext");
if (fileSystem.exists(path)) {
System.out.println("File " + dest + " already exists");
return;
}
FSDataOutputStream out = fileSystem.create(path);
InputStream in = new BufferedInputStream(new FileInputStream(
new File(source)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
in.close();
out.close();
fileSystem.close();
27. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Using HDFS Programatically
Reading file from HDFS
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/to/file.ext");
if (!fileSystem.exists(path)) {
System.out.println("File does not exists");
return;
}
FSDataInputStream in = fileSystem.open(path);
String filename = file.substring(file.lastIndexOf('/') + 1,
file.length());
OutputStream out = new BufferedOutputStream(new FileOutputStream(
new File(filename)));
byte[] b = new byte[1024];
int numBytes = 0;
while ((numBytes = in.read(b)) > 0) {
out.write(b, 0, numBytes);
}
28. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Using HDFS Programatically
Creating Directory in HDFS
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path(dir);
if (fileSystem.exists(path)) {
System.out.println("Dir " + dir + " already not exists");
return;
}
fileSystem.mkdirs(path);
fileSystem.close();
29. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
Using HDFS Programatically
Deleting file from HDFS
FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/to/file.ext");
if (!fileSystem.exists(path)) {
System.out.println("File does not exists");
return;
}
fileSystem.delete(new Path(file), true);
fileSystem.close();
30. Clogeny Technologies http://www.clogeny.com
(US) 408-556-9645
(India) +91 20 661 43 482
HDFS Web Interface
HDFS provides a web interface which can be
accessed on 50070 port
http://<virtual machine ip>:50070/dfshealth.jsp
Provides ability to browse the filesystem
Provides ability to view the NameNode logs
Provides overall cluster summery