Your SlideShare is downloading. ×
0
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

March 2011 HUG: HDFS Federation

9,424

Published on

Slides from the HDFS Federation talk at the March 2011 Hug by Suresh Srinivas.

Slides from the HDFS Federation talk at the March 2011 Hug by Suresh Srinivas.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,424
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
93
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. HDFS Federation
    Suresh Srinivas
    Yahoo! Inc
  • 2. Single Namenode Limitations
    Namespace
    NN process stores entire metadata in memory
    Number of objects (files + blocks) are limited by the heap size
    50G heap for 200 million objects - supports 4000 DNs, 12 PB of storage at 40 MB average file size
    Storage Growth– DN storage 4TB to 36TB; cluster size to 8000 DNs => Storage from 12PB to > 100PB
    Performance
    File system operations limited to a single NN throughput
    Bottleneck for Next Generation Of MapReduce
    Isolation
    Experimental apps can affect production apps
    Cluster Availability
    Failure of single namenode brings down the entire cluster
  • 3. Scaling the Name Service:
    Separate
    Block Management
    from NN
    Not to scale
    Block-reports for Billions of blocks requires rethinking
    block layer
    # clients
    Good isolation
    properties
    100x
    50x
    Distributed Namenode
    20x
    Multiple
    Namespace
    volumes
    Partial
    NS in memory
    With Namespace
    volumes
    4x
    All NS
    in memory
    Partial
    NS (Cache)
    in memory
    1x
    Archives
    # names
    100M
    10B
    200M
    1B
    2B
    20B
    3
  • 4. Why Vertical Scaling is Not Sufficient?
    Why not use NNs with 512GB memory?
    Startup time is huge – currently 30mins to 2 hrs for 50GB NN
    Stop the world GC failures can bring down the cluster
    All DNs could be declared dead
    Debugging problems with large JVM heap is harder
    Optimizing NN memory usage is expensive
    Changes in trunk reduces used memory; expensive development time, code complexity
    Diminishing returns
  • 5. Why Federation?
    Simplicity
    Simpler robust design
    Multiple independent namenodes
    Core development in 3.5 months
    Changes mostly in Datanode, Config and Tools
    Very little change in Namenode
    Simpler implementation than Distributed Namenode
    Lesser scalability – but will serve the immediate needs
    Federation is an optional feature
    Existing single NN configuration supported as is
  • 6. HDFS Background
    Namenode
    Block Management
    Datanode
    Datanode

    Physical Storage
    HDFS has 2 main layers
    Namespace management
    Manages namespace consisting of directories, files and blocks
    Supports file system operations such as create/modify/list files & dirs
    Block storage
    Block management
    Manages DN membership
    Supports add/delete/modify/get block location
    Manages replication and replica placement
    Physical storage
    Supports read/write access to blocks.
    Namespace
    NS
    Block Storage
  • 7. Federation
    Datanode 2
    Datanode m
    Datanode 1
    ...
    ...
    ...
    Pools k
    Pools n
    Pools 1
    Block Pools
    Balancer
    NN-n
    NN-k
    NN-1
    Foreign NS n
    NS1
    ...
    ...
    NS k
    • Multiple independent namenodes/namespaces in a cluster
    • 8. NNs provide both namespace and block management
    • 9. DNs common storage layer
    • 10. Stores blocks for all the block pools
    • 11. Non-HDFS namespaces can share the same storage
  • Federated HDFS Cluster
    Current
    Federated HDFS
    Multiple independent namespaces
    A namespace uses 1 block pool
    Multiple independent set of blocks
    A block pool is a set of blocks belonging to a single namespace
    Implemented as
    Multiple namenodes
    Set of datanodes
    Each datanode stores blocks for all block pools
    1 Namespace
    1 set of blocks
    Implemented as
    1 Namenode
    Set of datanodes
  • 12. Datanode Changes
    A thread per NN
    register with all the NNs
    periodic heartbeat to all the NNs with utilization summary
    block report to the NN for its block pool
    NNs can be added/removed/upgraded on the fly
    Block Pools
    Automatically created when DN talks to NN
    Block identified by ExtendedBlockID = BlockPoolID + BlockID
    Unique Block Pool ID across clusters - enables merging clusters
    DN data structures are “indexed” by BPID
    BlockMap, storage etc. indexed by BPID
    Upgrade/rollback happens per Block Pool/per NN
  • 13. Other Changes
    Decommissioning
    Tools to initiate and monitor decom at all the NNs
    Balancer
    Allows balancing at datanode or block pool level
    Datanode daemons
    disk scanner and directory scanner adapted to federation
    NN Web UI
    Additionally shows NN’s block pool storage utilization
  • 14. New Cluster Manager Web UI
    Cluster Summary
    Shows overall cluster storage utilization
    List of namenodes
    For each NN - BPID, storage utilization, number of missing blocks, number of live & dead DNs
    NN link to go to NN Web UI
    Decommissioning status of DNs
  • 15. Managing Namespaces
    Client-side mount-table
    /
    Federation has multiple namespaces – don’t you need a single global namespace?
    Key is to share the data and the names used to access the shared data.
    A global namespace is one way to do that – but even there we talk of several large “global” namespaces
    Client-side mount table is another way to share
    Shared mount-table => “global” shared view
    Personalized mount-table => per-application view
    Share the data that matter by mounting it
    tmp
    home
    project
    data
  • 16. Impact On Existing Deployments
    Very little impact on clusters with single NN
    Old configuration runs as is
    Two commands change
    NN format and first upgrade has a new ClusterID option
    During design/implementation lot of effort went into ensure single NN deployments work as is
    A lot of testing effort to validate this
  • 17. Summary
    Federated HDFS (Jira HDFS-1052)
    • Existing single Namenode deployments run as is
    • 18. Scale by adding independent Namenodes
    Preserves the robustness of the Namenodes
    Not much code change to the Namenode
    • Generalizes the Block Storage layer
    Can add other implementations of the Namenodes
    Even other name services (HBase?)
    Could move the Block management out of the Namenode in the future
    • Other Benefits
    Improved isolation and hence availability
    Isolate different application categories – e.g. separate Namenode for HBase
  • 19. Questions?

×