Hadoop Distributed File System
Made By:
Ameya Vijay Gokhale
14070121505
B.Tech (IT) 2014-18
INTRODUCTION
 Open-Source Software Framework (Apache Licensed)
 Based on Google File System (GFS)
 Two Main Components:
- HDFS
- MapReduce
 Master – Slave Architecture
HDFS ARCHITECTURE
COMPONENTS
 NameNode
 DataNode
 Secondary NameNode
NameNode
 Master
 Single Point of Contact
 Manages File System NameSpace and Metadata
 Metadata includes:
- List Of Files
- List of Blocks for each file
- List of DataNode for each Block
- File Attributes
 Maintain Transaction Log
 Only One Per Cluster
Secondary NameNode
 Maintains a copy of NameNode metadata
 Periodically merge the namespace image with the edit
log, if edit log becomes too large
 runs on a different machine than the namenode
 Gap between Primary and Secondary NameNode
DataNode
 Slaves
 Location where Data is Stored.
 Stores data in Local File system with it’s metadata like
CRC.
 Periodically sends a report of all existing blocks to the
NameNode
 Can be thousands in a cluster.
Data Block
 Data is stored in Blocks
 64MB / 128 MB per block
 Files are split and are stored on DataNodes
 Large Block size for Minimum Disk Seek Times
 Eg: Assuming 10 ms of seek time, and 100 MB/s as disk
transfer rate, if block size if 100 MB, then seek time is
1% of transfer time which is small enough to ignore.
WORKING
WORKING
WORKING
ADVANTAGES
 Flexible
 Fault Tolerant
 Scalable
 Performs computation across several hosts
 Locality of Data

Hadoop distributed file system

  • 1.
    Hadoop Distributed FileSystem Made By: Ameya Vijay Gokhale 14070121505 B.Tech (IT) 2014-18
  • 2.
    INTRODUCTION  Open-Source SoftwareFramework (Apache Licensed)  Based on Google File System (GFS)  Two Main Components: - HDFS - MapReduce  Master – Slave Architecture
  • 3.
  • 4.
  • 5.
    NameNode  Master  SinglePoint of Contact  Manages File System NameSpace and Metadata  Metadata includes: - List Of Files - List of Blocks for each file - List of DataNode for each Block - File Attributes  Maintain Transaction Log  Only One Per Cluster
  • 6.
    Secondary NameNode  Maintainsa copy of NameNode metadata  Periodically merge the namespace image with the edit log, if edit log becomes too large  runs on a different machine than the namenode  Gap between Primary and Secondary NameNode
  • 7.
    DataNode  Slaves  Locationwhere Data is Stored.  Stores data in Local File system with it’s metadata like CRC.  Periodically sends a report of all existing blocks to the NameNode  Can be thousands in a cluster.
  • 8.
    Data Block  Datais stored in Blocks  64MB / 128 MB per block  Files are split and are stored on DataNodes  Large Block size for Minimum Disk Seek Times  Eg: Assuming 10 ms of seek time, and 100 MB/s as disk transfer rate, if block size if 100 MB, then seek time is 1% of transfer time which is small enough to ignore.
  • 9.
  • 10.
  • 11.
  • 12.
    ADVANTAGES  Flexible  FaultTolerant  Scalable  Performs computation across several hosts  Locality of Data