1. HADOOP DISTRIBUTED FILE
SYSTEM
PRESENTED BY:-
Koushik Mondal
B.Tech
Information Technology
3rd Year, 6th Semester
Roll no- 16900215021
Registration no- 151690110147
01
2. Index
1. Hadoop
2. Hadoop Component
3. Distributed file Systems
4. Main Components of HDFS
5. HDFS Architecture
6. Anatomy of a file Read in HDFS
7. Anatomy of a file Write in HDFS
8. Basic Commands in HDFS
9. Conclusion
10. References
02
3. WHAT IS HADOOP?
Hadoop is a framework that allows to store and
process Big Data in a distributed environment
across group of computers using simple
programming models.
It is an Open-source Data Management, so it is
freely available & also we can configure it
according to our requirement.
03
4. HADOOP COMPONENT
Hadoop Distributed File System (HDFS)
It is used to store Big Data
Map Reduce
It is used for processing the Big Data
04
5. DISTRIBUTED FILE SYSTEM
A Distributed File System
(DFS) is a file system that
allows access to files from
multiple hosts sharing via
a computer network.
05
6. MAIN COMPONENTS OF HDFS
NameNode:
Master of the system
Maintains and manages the
blocks which are present on
the DataNodes
DataNode:
Slaves which are deployed on
each machine and provide the
actual storage
Responsible for serving read
and write requests for the
clients
06
8. ANATOMY OF A FILE READ IN HDFS
Anatomy of a file read in HDFS
08
9. ANATOMY OF A FILE WRITE IN HDFS
Anatomy of a file write in HDFS
09
10. BASIC COMMANDS IN HDFS
To run Hadoop background process To create directory in HDFS
To display list of file in HDFS
10
11. BASIC COMMANDS IN HDFS
To copy file in HDFS To move file in HDFS
11
12. BASIC COMMANDS IN HDFS
To load file from LFS to HDFS To copy file from HDFS to LFS
12
13. BASIC COMMANDS IN HDFS
To check the heath of the directory To check the cluster balance
To count directory's and files
13
14. BASIC COMMANDS IN HDFS
To show the contents of file To delete a file in HDFS
To delete a directory in HDFS
14
15. Hadoop has been very effective solution for
companies dealing with the data in petabytes.
It has solved many problems in industry related to
huge data management and distributed system.
As it is open source, so it is adopted by companies
widely.
CONCLUSION
15
16. REFERENCES
BOOK:
Hadoop-The Definitive Guide, 4th Edition by Tom White
LINKS:
http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
https://developer.yahoo.com/hadoop/tutorial/module2.html
https://www.edureka.co/blog/apache-hadoop-hdfs-architecture/
16