Hadoop distributed file system

Hadoop Distributed File System
Made By:
Ameya Vijay Gokhale
14070121505
B.Tech (IT) 2014-18

INTRODUCTION
 Open-Source Software Framework (Apache Licensed)
 Based on Google File System (GFS)
 Two Main Components:
- HDFS
- MapReduce
 Master – Slave Architecture

COMPONENTS
 NameNode
 DataNode
 Secondary NameNode

NameNode
 Master
 Single Point of Contact
 Manages File System NameSpace and Metadata
 Metadata includes:
- List Of Files
- List of Blocks for each file
- List of DataNode for each Block
- File Attributes
 Maintain Transaction Log
 Only One Per Cluster

Secondary NameNode
 Maintains a copy of NameNode metadata
 Periodically merge the namespace image with the edit
log, if edit log becomes too large
 runs on a different machine than the namenode
 Gap between Primary and Secondary NameNode

DataNode
 Slaves
 Location where Data is Stored.
 Stores data in Local File system with it’s metadata like
CRC.
 Periodically sends a report of all existing blocks to the
NameNode
 Can be thousands in a cluster.

Data Block
 Data is stored in Blocks
 64MB / 128 MB per block
 Files are split and are stored on DataNodes
 Large Block size for Minimum Disk Seek Times
 Eg: Assuming 10 ms of seek time, and 100 MB/s as disk
transfer rate, if block size if 100 MB, then seek time is
1% of transfer time which is small enough to ignore.

ADVANTAGES
 Flexible
 Fault Tolerant
 Scalable
 Performs computation across several hosts
 Locality of Data

Hadoop distributed file system

More Related Content

What's hot

Similar to Hadoop distributed file system

Recently uploaded

Hadoop distributed file system