• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hug - March -  HDFS Federation
 

Hug - March - HDFS Federation

on

  • 9,121 views

Slides from the HDFS Federation talk at the March 2011 Hug by Suresh Srinivas.

Slides from the HDFS Federation talk at the March 2011 Hug by Suresh Srinivas.

Statistics

Views

Total Views
9,121
Views on SlideShare
4,068
Embed Views
5,053

Actions

Likes
2
Downloads
88
Comments
0

11 Embeds 5,053

http://xlos.tistory.com 3194
http://www.scoop.it 1113
http://hbase.info 508
http://bigdatafoundation.com 152
http://blog.csdn.net 47
http://feeds.feedburner.com 16
http://www.hanrss.com 12
url_unknown 5
http://webcache.googleusercontent.com 3
http://static.slidesharecdn.com 2
https://www.google.co.kr 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hug - March -  HDFS Federation Hug - March - HDFS Federation Presentation Transcript

    • HDFS Federation
      Suresh Srinivas
      Yahoo! Inc
    • Single Namenode Limitations
      Namespace
      NN process stores entire metadata in memory
      Number of objects (files + blocks) are limited by the heap size
      50G heap for 200 million objects - supports 4000 DNs, 12 PB of storage at 40 MB average file size
      Storage Growth– DN storage 4TB to 36TB; cluster size to 8000 DNs => Storage from 12PB to > 100PB
      Performance
      File system operations limited to a single NN throughput
      Bottleneck for Next Generation Of MapReduce
      Isolation
      Experimental apps can affect production apps
      Cluster Availability
      Failure of single namenode brings down the entire cluster
    • Scaling the Name Service:
      Separate
      Block Management
      from NN
      Not to scale
      Block-reports for Billions of blocks requires rethinking
      block layer
      # clients
      Good isolation
      properties
      100x
      50x
      Distributed Namenode
      20x
      Multiple
      Namespace
      volumes
      Partial
      NS in memory
      With Namespace
      volumes
      4x
      All NS
      in memory
      Partial
      NS (Cache)
      in memory
      1x
      Archives
      # names
      100M
      10B
      200M
      1B
      2B
      20B
      3
    • Why Vertical Scaling is Not Sufficient?
      Why not use NNs with 512GB memory?
      Startup time is huge – currently 30mins to 2 hrs for 50GB NN
      Stop the world GC failures can bring down the cluster
      All DNs could be declared dead
      Debugging problems with large JVM heap is harder
      Optimizing NN memory usage is expensive
      Changes in trunk reduces used memory; expensive development time, code complexity
      Diminishing returns
    • Why Federation?
      Simplicity
      Simpler robust design
      Multiple independent namenodes
      Core development in 3.5 months
      Changes mostly in Datanode, Config and Tools
      Very little change in Namenode
      Simpler implementation than Distributed Namenode
      Lesser scalability – but will serve the immediate needs
      Federation is an optional feature
      Existing single NN configuration supported as is
    • HDFS Background
      Namenode
      Block Management
      Datanode
      Datanode

      Physical Storage
      HDFS has 2 main layers
      Namespace management
      Manages namespace consisting of directories, files and blocks
      Supports file system operations such as create/modify/list files & dirs
      Block storage
      Block management
      Manages DN membership
      Supports add/delete/modify/get block location
      Manages replication and replica placement
      Physical storage
      Supports read/write access to blocks.
      Namespace
      NS
      Block Storage
    • Federation
      Datanode 2
      Datanode m
      Datanode 1
      ...
      ...
      ...
      Pools k
      Pools n
      Pools 1
      Block Pools
      Balancer
      NN-n
      NN-k
      NN-1
      Foreign NS n
      NS1
      ...
      ...
      NS k
      • Multiple independent namenodes/namespaces in a cluster
      • NNs provide both namespace and block management
      • DNs common storage layer
      • Stores blocks for all the block pools
      • Non-HDFS namespaces can share the same storage
    • Federated HDFS Cluster
      Current
      Federated HDFS
      Multiple independent namespaces
      A namespace uses 1 block pool
      Multiple independent set of blocks
      A block pool is a set of blocks belonging to a single namespace
      Implemented as
      Multiple namenodes
      Set of datanodes
      Each datanode stores blocks for all block pools
      1 Namespace
      1 set of blocks
      Implemented as
      1 Namenode
      Set of datanodes
    • Datanode Changes
      A thread per NN
      register with all the NNs
      periodic heartbeat to all the NNs with utilization summary
      block report to the NN for its block pool
      NNs can be added/removed/upgraded on the fly
      Block Pools
      Automatically created when DN talks to NN
      Block identified by ExtendedBlockID = BlockPoolID + BlockID
      Unique Block Pool ID across clusters - enables merging clusters
      DN data structures are “indexed” by BPID
      BlockMap, storage etc. indexed by BPID
      Upgrade/rollback happens per Block Pool/per NN
    • Other Changes
      Decommissioning
      Tools to initiate and monitor decom at all the NNs
      Balancer
      Allows balancing at datanode or block pool level
      Datanode daemons
      disk scanner and directory scanner adapted to federation
      NN Web UI
      Additionally shows NN’s block pool storage utilization
    • New Cluster Manager Web UI
      Cluster Summary
      Shows overall cluster storage utilization
      List of namenodes
      For each NN - BPID, storage utilization, number of missing blocks, number of live & dead DNs
      NN link to go to NN Web UI
      Decommissioning status of DNs
    • Managing Namespaces
      Client-side mount-table
      /
      Federation has multiple namespaces – don’t you need a single global namespace?
      Key is to share the data and the names used to access the shared data.
      A global namespace is one way to do that – but even there we talk of several large “global” namespaces
      Client-side mount table is another way to share
      Shared mount-table => “global” shared view
      Personalized mount-table => per-application view
      Share the data that matter by mounting it
      tmp
      home
      project
      data
    • Impact On Existing Deployments
      Very little impact on clusters with single NN
      Old configuration runs as is
      Two commands change
      NN format and first upgrade has a new ClusterID option
      During design/implementation lot of effort went into ensure single NN deployments work as is
      A lot of testing effort to validate this
    • Summary
      Federated HDFS (Jira HDFS-1052)
      • Existing single Namenode deployments run as is
      • Scale by adding independent Namenodes
      Preserves the robustness of the Namenodes
      Not much code change to the Namenode
      • Generalizes the Block Storage layer
      Can add other implementations of the Namenodes
      Even other name services (HBase?)
      Could move the Block management out of the Namenode in the future
      • Other Benefits
      Improved isolation and hence availability
      Isolate different application categories – e.g. separate Namenode for HBase
    • Questions?