Your SlideShare is downloading. ×
0
HDFSFisher Liao2013/01/17
Goals    Hardware Failure   Streaming Data Access   Large Data Sets   Appending-Writes and File Syncs       Hflush   ...
NameNode & DataNodes    master/slave
File System Namespace    replication factor
Data Replication    block size/replication factor configurable per    file   namenode receive Heartbeat/Blockreport    f...
Data Replication(Cont.)    replica selection - closest to reader   safemode(namenode)       on startup            no r...
Persistence of File SystemMetadata    Editlog   FsImage   Checkpoint   datanode       each block a file       on sta...
Communication Protocol    TCP/IP   ClientProtocol   DataNode Protocol
Robustness    failures       NameNode failure       DataNode failure       network partitions    data disk failure/he...
Data Organization    data blocks   replication pipelining – write    1.   namenode receive list of datanode by algorism ...
Accessibility    API   FS Shell   DFSAdmin   Browser
Space Reclamation    Delete   Undelete   decrease replication factor
Hdfs
Upcoming SlideShare
Loading in...5
×

Hdfs

714

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
714
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
57
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • hflush make unclosed file readable append opening a closed file to add Portable hardware and software
  • Blockreport - list of all blocks on datanode rack - namenode determine rack id of each datanode ex. 3 replica - 1 local rack - 1 remote rack - 1 same remote rack, different node
  • meatadata disk failure - namenode support multi-FsImage/EditLog - sync degrage - manual snapshot(HDFS not support yet) - for rollback
  • data blocks write-once-read-many 64MB
  • HDFS provide - Java API for application - C wrapper for Java API - WebDAV protocol for HTTP browser FS Shell - CLI ex. bin/hadoop dfs -mkdir /foodir ex. bin/hadoop dfs -rmr /foodir ex. bin/hadoop dfs -cat /foodir/myfile.txt DFSAdmin - command set - administrator ex. bin/hadoop dfsadmin -safemode enter // cluster ex. bin/hadoop dfsadmin -repost // generate list of datanodes Browser in typical HDFS install
  • delete 1. user delete file 2. rename file to /trash (can be restored) 3. remain for 6hr(configurable) 4. namenode delete 5. free associated blocks undelete - if in /trash decrease replication factor - namenode select - setReplication
  • Transcript of "Hdfs"

    1. 1. HDFSFisher Liao2013/01/17
    2. 2. Goals Hardware Failure Streaming Data Access Large Data Sets Appending-Writes and File Syncs  Hflush  Append Moving Comuptation Portable
    3. 3. NameNode & DataNodes master/slave
    4. 4. File System Namespace replication factor
    5. 5. Data Replication block size/replication factor configurable per file namenode receive Heartbeat/Blockreport from datanodes  Heartbeat  Blockreport replica placement  Policy  Rack
    6. 6. Data Replication(Cont.) replica selection - closest to reader safemode(namenode)  on startup  no replication  exit after namenode data block check > x%  replicate
    7. 7. Persistence of File SystemMetadata Editlog FsImage Checkpoint datanode  each block a file  on starup, scan local > blockreport
    8. 8. Communication Protocol TCP/IP ClientProtocol DataNode Protocol
    9. 9. Robustness failures  NameNode failure  DataNode failure  network partitions data disk failure/heartbeats/re-replication cluster rebalancing - free space, threshold data integrity – checksum meatadata disk failure snapshot(HDFS not support yet)
    10. 10. Data Organization data blocks replication pipelining – write 1. namenode receive list of datanode by algorism 2. client write to 1st datanode 3. 1st datanode receive small portions(4KB) 4. 1st datanode copy this portion to 2nd datanode
    11. 11. Accessibility API FS Shell DFSAdmin Browser
    12. 12. Space Reclamation Delete Undelete decrease replication factor
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×