Reaching 10,000
Aaron Cordova
Booz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010
cordova_aaron@bah.com
Lots of Applications Require
Scalability
                   Machine Learning
     Text
                                Def...
Hadoop Scales
Linear Scalability
Cost ->




                           Data Size ->
          Shared Nothing                  Shared Di...
Massive Parallelism
MapReduce

Simplified Distributed Programming Model
Fault Tolerant
Designed to Scale to Thousands of Servers
Many Algorithm...
HDFS

Distributed File System
Optimized for High-Throughput
Fault Tolerant Through Replication, Checksumming
Designed to S...
Hadoop is a Platform
Pig

    MapReduce
                       HBase

Cascading                  Flume
            HDFS

                      ...
HBase

Scalable Structured store
Fast Lookups
Durable, Consistent Writes
Automatic Partitioning
Mahout


Scalable Machine Learning Algorithms
Clustering
Classification
Fuzzy Table


Low-Latency Parallel Search
Generalized Fuzzy Matching
Images, Biometrics, Audio
One Major Problem
HDFS Single NameNode

Single NameSpace - easy to serialize operations
NameSpace stored entirely in memory
Changes written ...
NameNode Scalability
                 “100,000 HDFS clients on a 10,000-node
                 HDFS cluster will exceed the...
Goal
                                    50

       writes/second (thousands)
                                   37.5



 ...
HDFS Single NameNode

Server grade machine
Lots of memory
Reliable components
RAID
Hot-Failover
Needs Parallelism
Scaling NameNode

Grow memory
Read-only Replicas of NameNode
Multiple static namespace partitions
Distributed name server,...
Distributed NameNode
Features

Fast Lookups
Durable, Consistent writes
Automatic Partitioning
Can we use HBase?
Mappings as HBase Tables
NameSpace

filename : blocks   DataNodes

                   node : blocks   Blocks

             ...
How to order namespace?
Depth First Search Order
                /
                /dir1
                /dir1/subdir
                /dir1/subdir...
Depth First Operations


   Delete (Recursive)
    Move / Rename
Breadth First Search Order
                0/
                1/dir1
                2/dir2/file1
                2/dir2/fil...
Breadth First Operations



            List
Current Architecture
 NameNode




DataNode    DataNode   DFSClient   DFSClient
Proposed Architecture


 RServer   RServer    RServer       RServer




DNNProxy   DNNProxy    DNNProxy        DNNProxy

D...
100k clients -> 41k writes/s
Anticipated Performance
                             50
writes/second (thousands)




                            37.5



...
Issues


Synchronization - multiple writers, changes
Name distribution hotspots
Current Status

Working code exists that uses HBase with slightly
modified DFSClient and DataNode for create, write,
close,...
Code


Will be at http://code.google.com/p/hdfs-dnn
Available under the Apache license - whichever is
compatible with Hado...
Doesn’t HBase run on HDFS?
Self-Hosted HBase

May be possible to have HBase use the same HDFS
instance it’s supporting
Some recursion and self-refere...
Upcoming SlideShare
Loading in...5
×

Design for a Distributed Name Node

6,687

Published on

A proposed design for a distributed HDFS NameNode.

Published in: Technology
1 Comment
8 Likes
Statistics
Notes
  • Does this distributed name node appear in any paper or publications? Very interested.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
6,687
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
144
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide

Design for a Distributed Name Node

  1. 1. Reaching 10,000 Aaron Cordova Booz Allen Hamilton | Hadoop Meetup DC | Sep 7 2010 cordova_aaron@bah.com
  2. 2. Lots of Applications Require Scalability Machine Learning Text Defense Intelligence Graph Analytics Bio-Metrics Video Bio-Informatics Network Security Images Structured Data
  3. 3. Hadoop Scales
  4. 4. Linear Scalability Cost -> Data Size -> Shared Nothing Shared Disk
  5. 5. Massive Parallelism
  6. 6. MapReduce Simplified Distributed Programming Model Fault Tolerant Designed to Scale to Thousands of Servers Many Algorithms Easily Expressed as Map and Reduce
  7. 7. HDFS Distributed File System Optimized for High-Throughput Fault Tolerant Through Replication, Checksumming Designed to Scale to 10,000 servers
  8. 8. Hadoop is a Platform
  9. 9. Pig MapReduce HBase Cascading Flume HDFS Nutch Mahout Hive
  10. 10. HBase Scalable Structured store Fast Lookups Durable, Consistent Writes Automatic Partitioning
  11. 11. Mahout Scalable Machine Learning Algorithms Clustering Classification
  12. 12. Fuzzy Table Low-Latency Parallel Search Generalized Fuzzy Matching Images, Biometrics, Audio
  13. 13. One Major Problem
  14. 14. HDFS Single NameNode Single NameSpace - easy to serialize operations NameSpace stored entirely in memory Changes written to transaction log first Single Point of Failure Performance Bottleneck?
  15. 15. NameNode Scalability “100,000 HDFS clients on a 10,000-node HDFS cluster will exceed the throughput capacity of a single name-node. ... any solution intended for single namespace server optimization lacks Konstantin scalability. Shvachko ... the most promising solutions seem to Login Apr 2010 be based on distributing the namespace server ...”
  16. 16. Goal 50 writes/second (thousands) 37.5 25 12.5 0 Single NN Target
  17. 17. HDFS Single NameNode Server grade machine Lots of memory Reliable components RAID Hot-Failover
  18. 18. Needs Parallelism
  19. 19. Scaling NameNode Grow memory Read-only Replicas of NameNode Multiple static namespace partitions Distributed name server, partition namespace dynamically
  20. 20. Distributed NameNode Features Fast Lookups Durable, Consistent writes Automatic Partitioning
  21. 21. Can we use HBase?
  22. 22. Mappings as HBase Tables NameSpace filename : blocks DataNodes node : blocks Blocks block : nodes
  23. 23. How to order namespace?
  24. 24. Depth First Search Order / /dir1 /dir1/subdir /dir1/subdir/file /dir2/file1 /dir2/file2
  25. 25. Depth First Operations Delete (Recursive) Move / Rename
  26. 26. Breadth First Search Order 0/ 1/dir1 2/dir2/file1 2/dir2/file2 2/dir1/subdir 3/dir2/subdir/file
  27. 27. Breadth First Operations List
  28. 28. Current Architecture NameNode DataNode DataNode DFSClient DFSClient
  29. 29. Proposed Architecture RServer RServer RServer RServer DNNProxy DNNProxy DNNProxy DNNProxy DataNode DataNode DFSClient DFSClient
  30. 30. 100k clients -> 41k writes/s
  31. 31. Anticipated Performance 50 writes/second (thousands) 37.5 25 12.5 0 100 150 200 250 # machines hosting namespace Single NN Distributed NN Target
  32. 32. Issues Synchronization - multiple writers, changes Name distribution hotspots
  33. 33. Current Status Working code exists that uses HBase with slightly modified DFSClient and DataNode for create, write, close, open, read, mkdirs, delete. New component: HealthServer monitors DataNodes and does garbage collection. More like BigTable master, can die, restart without affecting clients.
  34. 34. Code Will be at http://code.google.com/p/hdfs-dnn Available under the Apache license - whichever is compatible with Hadoop
  35. 35. Doesn’t HBase run on HDFS?
  36. 36. Self-Hosted HBase May be possible to have HBase use the same HDFS instance it’s supporting Some recursion and self-reference already exists: HBase Metadata table is itself a table in HBase Have to work out bootstrapping and failure recovery to resolve any potential circular dependencies
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×