0
DFSDistributed File    System
Share Files Easily in   Public Folder
What about this type of networks?
What Is DFS In Real World?DFS allows administrators to consolidate fileshares that may exist on multiple servers toappear ...
Example:
Benefits of DFS•    Resources management    – (users access all resources through a single point)• Accessibility    – (use...
Hadoop
Assumptions and Goals (1)• HDFS instance consist of thousands of server• HDFS is always non-fuctional• Automatic recovery ...
Assumptions and Goals (2)• HDFS needs streaming access to their DataSets• HDFS is designed for batch processing rather  th...
Assumptions and Goals (3)• Moving Computation is Cheaper Than Moving Data• Portability across Heterogenous HW & SW
NameNode and DataNodes (1)• Master/slave architecture• An HDFS cluster consists of:     - Single NameNode     - a Master S...
NameNode and DataNodes (2)• Internally, a file is split into one or more  blocks and these blocks are stored in a set of  ...
NameNode and DataNodes (3)• The DataNodes are responsible for serving  read and write requests from the file system’s  cli...
NameNode and DataNodes (4)
NameNode and DataNodes (5)• HDFS Run a GNU/Linux operating system (OS)• HDFS is built using the Java language
File System NameSpace (1)• HDFS supports a traditional hierarchical file  organization• HDFS does not yet implement user a...
File System NameSpace (2)• An application can specify the number of  replicas of a file that should be maintained by  HDFS...
Data Replication (1)• HDFS reliably store very large files across  machines in a large cluster.• It stores each file as a ...
Data Replication (2)• NameNode makes all decisions for replication  of blocks.• It periodically receives a Heartbeat and a...
Data Replication (3)• Receipt of a Heartbeat implies that the  DataNode is functioning properly.• A Blockreport contains a...
Data Replication (4)
File System Metadata (1)
File System Metadata (2)• EditLog  – records any changes in File system• FSimage  – Stores blockmaping and filesystem prop...
File System Metadata (3)• The NameNode keeps an image of the entire file  system namespace and file Blockmap in memory.• T...
• Blockreport  – When a DataNode starts up, it scans through its    local file system, generates a list of all HDFS data  ...
Robustness• Cluster Rebalancing• Data Integrity(checksum)• Metadata Disk Failure• Snapshots
Refrences:1. http://www.maxi-pedia.com/what+is+DFS2. www.Apachi.org
Upcoming SlideShare
Loading in...5
×

Hadoop Distributed File System

1,675

Published on

1 Comment
2 Likes
Statistics
Notes
  • it is a nice website to learn about any topic.......
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,675
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
144
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Hadoop Distributed File System"

  1. 1. DFSDistributed File System
  2. 2. Share Files Easily in Public Folder
  3. 3. What about this type of networks?
  4. 4. What Is DFS In Real World?DFS allows administrators to consolidate fileshares that may exist on multiple servers toappear as though they all live in the samelocation so that users can access them from asingle point on the network
  5. 5. Example:
  6. 6. Benefits of DFS• Resources management – (users access all resources through a single point)• Accessibility – (users do not need to know the physical location of the shared folder, then can navigate to it through Explorer and domain tree)• Fault tolerance – (shares can be replicated, so if the server in Chicago goes down, resources still will be available to users)• Work load management – (DFS allows administrators to distribute shared folders and workloads across several servers for more efficient network and server resources use)
  7. 7. Hadoop
  8. 8. Assumptions and Goals (1)• HDFS instance consist of thousands of server• HDFS is always non-fuctional• Automatic recovery is a architectural goals of HDFS
  9. 9. Assumptions and Goals (2)• HDFS needs streaming access to their DataSets• HDFS is designed for batch processing rather than interactive use y users• HDFS has Large DataSets same as GB & TB
  10. 10. Assumptions and Goals (3)• Moving Computation is Cheaper Than Moving Data• Portability across Heterogenous HW & SW
  11. 11. NameNode and DataNodes (1)• Master/slave architecture• An HDFS cluster consists of: - Single NameNode - a Master Server manages file system namespace and regulates access to files by clients - Number of DataNodes One per node in cluster Manage storage attached to the nodes they run on
  12. 12. NameNode and DataNodes (2)• Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes• The NameNode executes file system namespace operations like opening, closing, and renaming files and directories
  13. 13. NameNode and DataNodes (3)• The DataNodes are responsible for serving read and write requests from the file system’s clients• The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode
  14. 14. NameNode and DataNodes (4)
  15. 15. NameNode and DataNodes (5)• HDFS Run a GNU/Linux operating system (OS)• HDFS is built using the Java language
  16. 16. File System NameSpace (1)• HDFS supports a traditional hierarchical file organization• HDFS does not yet implement user access permissions• HDFS does not support hard links or soft links• NameNode maintains the file system namespace
  17. 17. File System NameSpace (2)• An application can specify the number of replicas of a file that should be maintained by HDFS• The number of copies of a file is called the replication factor of that file
  18. 18. Data Replication (1)• HDFS reliably store very large files across machines in a large cluster.• It stores each file as a sequence of blocks• all blocks except the last block are same size• The block size and replication factor are configurable per file
  19. 19. Data Replication (2)• NameNode makes all decisions for replication of blocks.• It periodically receives a Heartbeat and a Blockreport from each of DataNodes in the cluster
  20. 20. Data Replication (3)• Receipt of a Heartbeat implies that the DataNode is functioning properly.• A Blockreport contains a list of all blocks on a DataNode
  21. 21. Data Replication (4)
  22. 22. File System Metadata (1)
  23. 23. File System Metadata (2)• EditLog – records any changes in File system• FSimage – Stores blockmaping and filesystem properties
  24. 24. File System Metadata (3)• The NameNode keeps an image of the entire file system namespace and file Blockmap in memory.• This key metadata is compact, (4GB of RAM = huge number of files)• checkpoint – NN starts up, it reads the FsImage and EditLog – applies all the transactions from the EditLog to the in- memory representation of the FsImage – flushes out this new version into a new FsImage on disk. – checkpoint only occurs when the NameNode starts up.
  25. 25. • Blockreport – When a DataNode starts up, it scans through its local file system, generates a list of all HDFS data blocks that correspond to each of these local files and sends this report to the NameNode
  26. 26. Robustness• Cluster Rebalancing• Data Integrity(checksum)• Metadata Disk Failure• Snapshots
  27. 27. Refrences:1. http://www.maxi-pedia.com/what+is+DFS2. www.Apachi.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×