Design for a Distributed Name Node
Upcoming SlideShare
Loading in...5

Design for a Distributed Name Node



A proposed design for a distributed HDFS NameNode.

A proposed design for a distributed HDFS NameNode.



Total Views
Views on SlideShare
Embed Views



1 Embed 1 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Does this distributed name node appear in any paper or publications? Very interested.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Design for a Distributed Name Node Design for a Distributed Name Node Presentation Transcript

  • Reaching 10,000 Aaron Cordova Booz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010
  • Lots of Applications Require Scalability Machine Learning Text Defense Intelligence Graph Analytics Bio-Metrics Video Bio-Informatics Network Security Images Structured Data
  • Hadoop Scales
  • Linear Scalability Cost -> Data Size -> Shared Nothing Shared Disk
  • Massive Parallelism
  • MapReduce Simplified Distributed Programming Model Fault Tolerant Designed to Scale to Thousands of Servers Many Algorithms Easily Expressed as Map and Reduce
  • HDFS Distributed File System Optimized for High-Throughput Fault Tolerant Through Replication, Checksumming Designed to Scale to 10,000 servers
  • Hadoop is a Platform
  • Pig MapReduce HBase Cascading Flume HDFS Nutch Mahout Hive
  • HBase Scalable Structured store Fast Lookups Durable, Consistent Writes Automatic Partitioning
  • Mahout Scalable Machine Learning Algorithms Clustering Classification
  • Fuzzy Table Low-Latency Parallel Search Generalized Fuzzy Matching Images, Biometrics, Audio
  • One Major Problem
  • HDFS Single NameNode Single NameSpace - easy to serialize operations NameSpace stored entirely in memory Changes written to transaction log first Single Point of Failure Performance Bottleneck?
  • NameNode Scalability “100,000 HDFS clients on a 10,000-node HDFS cluster will exceed the throughput capacity of a single name-node. ... any solution intended for single namespace server optimization lacks Konstantin scalability. Shvachko ... the most promising solutions seem to Login Apr 2010 be based on distributing the namespace server ...”
  • Goal 50 writes/second (thousands) 37.5 25 12.5 0 Single NN Target
  • HDFS Single NameNode Server grade machine Lots of memory Reliable components RAID Hot-Failover
  • Needs Parallelism
  • Scaling NameNode Grow memory Read-only Replicas of NameNode Multiple static namespace partitions Distributed name server, partition namespace dynamically
  • Distributed NameNode Features Fast Lookups Durable, Consistent writes Automatic Partitioning
  • Can we use HBase?
  • Mappings as HBase Tables NameSpace filename : blocks DataNodes node : blocks Blocks block : nodes
  • How to order namespace?
  • Depth First Search Order / /dir1 /dir1/subdir /dir1/subdir/file /dir2/file1 /dir2/file2
  • Depth First Operations Delete (Recursive) Move / Rename
  • Breadth First Search Order 0/ 1/dir1 2/dir2/file1 2/dir2/file2 2/dir1/subdir 3/dir2/subdir/file
  • Breadth First Operations List
  • Current Architecture NameNode DataNode DataNode DFSClient DFSClient
  • Proposed Architecture RServer RServer RServer RServer DNNProxy DNNProxy DNNProxy DNNProxy DataNode DataNode DFSClient DFSClient
  • 100k clients -> 41k writes/s
  • Anticipated Performance 50 writes/second (thousands) 37.5 25 12.5 0 100 150 200 250 # machines hosting namespace Single NN Distributed NN Target
  • Issues Synchronization - multiple writers, changes Name distribution hotspots
  • Current Status Working code exists that uses HBase with slightly modified DFSClient and DataNode for create, write, close, open, read, mkdirs, delete. New component: HealthServer monitors DataNodes and does garbage collection. More like BigTable master, can die, restart without affecting clients.
  • Code Will be at Available under the Apache license - whichever is compatible with Hadoop
  • Doesn’t HBase run on HDFS?
  • Self-Hosted HBase May be possible to have HBase use the same HDFS instance it’s supporting Some recursion and self-reference already exists: HBase Metadata table is itself a table in HBase Have to work out bootstrapping and failure recovery to resolve any potential circular dependencies