Design for a Distributed Name Node


Published on

A proposed design for a distributed HDFS NameNode.

Published in: Technology
1 Comment
  • Does this distributed name node appear in any paper or publications? Very interested.
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Design for a Distributed Name Node

  1. 1. Reaching 10,000 Aaron Cordova Booz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010
  2. 2. Lots of Applications Require Scalability Machine Learning Text Defense Intelligence Graph Analytics Bio-Metrics Video Bio-Informatics Network Security Images Structured Data
  3. 3. Hadoop Scales
  4. 4. Linear Scalability Cost -> Data Size -> Shared Nothing Shared Disk
  5. 5. Massive Parallelism
  6. 6. MapReduce Simplified Distributed Programming Model Fault Tolerant Designed to Scale to Thousands of Servers Many Algorithms Easily Expressed as Map and Reduce
  7. 7. HDFS Distributed File System Optimized for High-Throughput Fault Tolerant Through Replication, Checksumming Designed to Scale to 10,000 servers
  8. 8. Hadoop is a Platform
  9. 9. Pig MapReduce HBase Cascading Flume HDFS Nutch Mahout Hive
  10. 10. HBase Scalable Structured store Fast Lookups Durable, Consistent Writes Automatic Partitioning
  11. 11. Mahout Scalable Machine Learning Algorithms Clustering Classification
  12. 12. Fuzzy Table Low-Latency Parallel Search Generalized Fuzzy Matching Images, Biometrics, Audio
  13. 13. One Major Problem
  14. 14. HDFS Single NameNode Single NameSpace - easy to serialize operations NameSpace stored entirely in memory Changes written to transaction log first Single Point of Failure Performance Bottleneck?
  15. 15. NameNode Scalability “100,000 HDFS clients on a 10,000-node HDFS cluster will exceed the throughput capacity of a single name-node. ... any solution intended for single namespace server optimization lacks Konstantin scalability. Shvachko ... the most promising solutions seem to Login Apr 2010 be based on distributing the namespace server ...”
  16. 16. Goal 50 writes/second (thousands) 37.5 25 12.5 0 Single NN Target
  17. 17. HDFS Single NameNode Server grade machine Lots of memory Reliable components RAID Hot-Failover
  18. 18. Needs Parallelism
  19. 19. Scaling NameNode Grow memory Read-only Replicas of NameNode Multiple static namespace partitions Distributed name server, partition namespace dynamically
  20. 20. Distributed NameNode Features Fast Lookups Durable, Consistent writes Automatic Partitioning
  21. 21. Can we use HBase?
  22. 22. Mappings as HBase Tables NameSpace filename : blocks DataNodes node : blocks Blocks block : nodes
  23. 23. How to order namespace?
  24. 24. Depth First Search Order / /dir1 /dir1/subdir /dir1/subdir/file /dir2/file1 /dir2/file2
  25. 25. Depth First Operations Delete (Recursive) Move / Rename
  26. 26. Breadth First Search Order 0/ 1/dir1 2/dir2/file1 2/dir2/file2 2/dir1/subdir 3/dir2/subdir/file
  27. 27. Breadth First Operations List
  28. 28. Current Architecture NameNode DataNode DataNode DFSClient DFSClient
  29. 29. Proposed Architecture RServer RServer RServer RServer DNNProxy DNNProxy DNNProxy DNNProxy DataNode DataNode DFSClient DFSClient
  30. 30. 100k clients -> 41k writes/s
  31. 31. Anticipated Performance 50 writes/second (thousands) 37.5 25 12.5 0 100 150 200 250 # machines hosting namespace Single NN Distributed NN Target
  32. 32. Issues Synchronization - multiple writers, changes Name distribution hotspots
  33. 33. Current Status Working code exists that uses HBase with slightly modified DFSClient and DataNode for create, write, close, open, read, mkdirs, delete. New component: HealthServer monitors DataNodes and does garbage collection. More like BigTable master, can die, restart without affecting clients.
  34. 34. Code Will be at Available under the Apache license - whichever is compatible with Hadoop
  35. 35. Doesn’t HBase run on HDFS?
  36. 36. Self-Hosted HBase May be possible to have HBase use the same HDFS instance it’s supporting Some recursion and self-reference already exists: HBase Metadata table is itself a table in HBase Have to work out bootstrapping and failure recovery to resolve any potential circular dependencies