Hadoop HDFS NameNode HA


Published on

Hadoop HDFS NameNode HA

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop HDFS NameNode HA

  1. 1. Anty RaoApril 10, 2011
  2. 2. Outline Architecture of HDFS Available NN HA options
  3. 3. HDFS architectureNN is SPOF, need some kind of HA for NN.
  4. 4. NN HACurrently two main available HA options: AvatarNode (facebook) BackupNode(yahoo!) (available?)
  5. 5. AvatarNode
  6. 6. AvatarNode (AN) Active-Standby Pair Client  Coordinated via ZooKeeper  Failover in few seconds Client retrieves block location from  Wrapper over NameNode Primary or Standby Active AvatarNode Write Read Active transaction Standby  Writes transaction log to AvatarNode transaction AvatarNode NFS filter (NameNode) (NameNode) Standby AvatarNode  Reads/Consumes transactions from NFS filter Block Block  Processes all messages from Location Location DataNodes messages messages  Latest metadata in memory DataNodes
  7. 7. Four steps to failover Wipe ZooKeeper entry. Clients will know the failover is in progress. (0 seconds) Stop the primary NameNode. Last bits of data will be flushed to Transaction Log and it will die. (Seconds) Switch Standby to Primary. It will consume the rest of the Transaction log and get out of SafeMode ready to serve traffic. (Seconds) Update the entry in ZooKeeper. All the clients waiting for failover will pick up the new connection (0 seconds) After: Start the first node in the Standby Mode (Takes a while, but the cluster is up and running)
  8. 8. AvatarNode @Facebook Diagram from Facebook Contrib@hadoop 0.20 (HDFS-976)
  9. 9. Conclusions Complete Hot Standby  NFS for storage of fsimage and editlogs. (no data loss)  Standby node Consumes transactions from editlogs on NFS continuously. (namespace hot standby)  DataNodes send message to both primary and standby node. (block reports hot standby) Fast Switchover  Less than a minute Make sense!
  10. 10. BackupNode
  11. 11. BackupNode (BN) NN synchronously streams Client transaction log to Client retrieves block location BackupNode from NN BackupNode applies log Synchronous NN to in-memory and disk stream transacton (NameNode) logs to BN image BN always commit to disk BN Block (BackupNode before success to NN Location ) If BN restarts, it has to messages catch up with NN Available in HDFS 0.20.1 release DataNodes
  12. 12. Limitations of BackupNode(BN) Maximum of one BackupNode per NN  Support only two-machine failure NN doesn’t forward block reports to BackupNode Time to restart from 12GB image, 70M files + 100M blocks  3-5 minutes to read the image from the disk  20 min to process block reports  BN will still take 25+ minutes to failover!
  13. 13. Conclusions Incomplete Hot Standby / Semi-Hot Standby  Namespace: hot standby  Block reports: cold standby Still-Slow Switchover
  14. 14. Other HA solutions DRDB + Linux HA http://www.cloudera.com/blog/2009/07/hadoop-ha- configuration/ metadata backup http://wiki.apache.org/hadoop/NameNodeFailover