HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability

on

  • 4,065 views

Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this ...

Scalability of the NameNode has been a key issue for HDFS clusters. Because the entire file system metadata is stored in memory on a single NameNode, and all metadata operations are processed on this single system, the NameNode both limits the growth in size of the cluster and makes the NameService a bottleneck for the MapReduce framework as demand increases. HDFS Federation horizontally scales the NameService using multiple federated NameNodes/namespaces. The federated NameNodes share the DataNodes in the cluster as a common storage layer. HDFS Federation also adds client-side namespaces to provide a unified view of the file system. In this talk, Hortonworks co-founder and key architect, Sanjay Raidia, will discuss the benefits, features and best practices for implementing HDFS Federation.

Statistics

Views

Total Views
4,065
Views on SlideShare
4,063
Embed Views
2

Actions

Likes
9
Downloads
166
Comments
0

1 Embed 2

https://twimg0-a.akamaihd.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HDFS Futures: NameNode Federation for Improved Efficiency and Scalability Presentation Transcript

  • 1. HDFS FederationSanjay RadiaFounder and Architect @ Hortonworks Page 1
  • 2. About Me•  Apache Hadoop Committer and Member of Hadoop PMC•  Architect of core-Hadoop @ Yahoo -  Focusing on HDFS, MapReduce scheduler, Compatibility, etc.•  PhD in Computer Science from the University of Waterloo, Canada Page 2
  • 3. Agenda•  HDFS Background•  Current Limitations•  Federation Architecture•  Federation Details•  Next Steps•  Q&A Page 3
  • 4. HDFS Architecture Two main layers •  NamespaceNamespace Namenode -  Consists of dirs, files and blocks NS -  Supports create, delete, modify and list files or dirs operations Block Management •  Block StorageBlock Storage -  Block Management Datanode …   Datanode •  Datanode cluster membership •  Supports create/delete/modify/get block Storage location operations •  Manages replication and replica placement -  Storage - provides read and write access to blocks Page 4
  • 5. HDFS Architecture Implemented as •  Single Namespace Volume -  Namespace Volume = Namespace + BlocksNamespace Namenode NS •  Single namenode with a namespace Block Management -  Entire namespace is in memory -  Provides Block ManagementBlock Storage •  Datanodes store block replicas Datanode …   Datanode -  Block files stored on local file system Storage Page 5
  • 6. Limitation - IsolationPoor Isolation•  All the tenants share a single namespace -  Separate volume for tenants is desirable•  Lacks separate namespace for different application categories or application requirements -  Experimental apps can affect production apps -  Example - HBase could use its own namespace Page 6
  • 7. Limitation - ScalabilityScalability•  Storage scales horizontally - namespace doesn’t•  Limited number of files, dirs and blocks -  250 million files and blocks at 64GB Namenode heap size •  Still a very large cluster •  Facebook clusters are sized at ~70 PB storagePerformance•  File system operations throughput limited by a single node -  120K read ops/sec and 6000 write ops/sec •  Support 4K clusters easily •  Easily scalable to 20K write ops/sec by code improvements Page 7
  • 8. Limitation – Tight CouplingNamespace and Block Management are distinct layers•  Tightly coupled due to co-location•  Separating the layers makes it easier to evolve each layer•  Separating services -  Scaling block management independent of namespace is simpler -  Simplifies Namespace and scaling it,Block Storage could be a generic service•  Namespace is one of the applications to use the service•  Other services can be built directly on Block Storage -  HBase -  MR Tmp -  Foreign namespaces Page 8
  • 9. Stated Problem Isolation is a problem for even small clusters! Page 9
  • 10. HDFS Federation Namespace NN-1 NN-k NN-n Foreig                    NS1                    NS  k n  NS  n ... ... Pool    1 Pool    k Pool    n Block Storage                        Block    Pools Datanode  1   Datanode  2 Datanode  m ... ... ... Common  Storage•  Multiple independent Namenodes and Namespace Volumes in a cluster -  Namespace Volume = Namespace + Block Pool•  Block Storage as generic storage service -  Set of blocks for a Namespace Volume is called a Block Pool -  DNs store blocks for all the Namespace Volumes – no partitioning Page 10
  • 11. Key Ideas & Benefits•  Distributed Namespace: Partitioned across namenodes Alternate NN -  Simple and Robust due to independent Implementation HBase HDFS masters Namespace MR tmp •  Each master serves a namespace volume •  Preserves namenode stability – little namenode code change -  Scalability – 6K nodes, 100K tasks, 200PB and Storage Service 1 billion files Page 11
  • 12. Key Ideas & Benefits•  Block Pools enable generic storage service -  Enables Namespace Volumes to be Alternate NN independent of each other Implementation HBase HDFS -  Fuels innovation and Rapid development Namespace MR tmp •  New implementations of file systems and Applications on top of block storage possible •  New block pool categories – tmp storage, Storage Service distributed cache, small object storage•  In future, move Block Management out of namenode to separate set of nodes -  Simplifies namespace/application implementation •  Distributed namenode becomes significantly simpler Page 12
  • 13. HDFS Federation Details•  Simple design -  Little change to the Namenode, most changes in Datanode, Config and Tools -  Core development in 4 months -  Namespace and Block Management remain in Namenode •  Block Management could be moved out of namenode in the future•  Little impact on existing deployments -  Single namenode configuration runs as is•  Datanodes provide storage services for all the namenodes -  Register with all the namenodes -  Send periodic heartbeats and block reports to all the namenodes -  Send block received/deleted for a block pool to corresponding namenode Page 13
  • 14. HDFS Federation Details•  Cluster Web UI for better manageability -  Provides cluster summary -  Includes namenode list and summary of namenode status -  Decommissioning status•  Tools -  Decommissioning works with multiple namespace -  Balancer works with multiple namespaces •  Both Datanode storage or Block Pool storage can be balanced•  Namenode can be added/deleted in Federated cluster -  No need to restart the cluster•  Single configuration for all the nodes in the cluster Page 14
  • 15. Managing Namespaces•  Federation has multiple namespaces Client-side / mount-table -  Don’t you need a single global namespace? -  Some tenants want private namespace -  Global? Key is to share the data and the names used to access the data data project home tmp•  A single global namespace is one way share NS4 NS1 NS2 NS3 Page 15
  • 16. Managing Namespaces•  Client-side mount table is another / Client-side way to share mount-table -  Shared mount-table => “global” shared view -  Personalized mount-table => per- data project home tmp application view •  Share the data that matter by mounting it•  Client-side implementation of NS4 mount tables -  No single point of failure -  No hotspot for root and top level NS1 NS2 NS3 directories Page 16
  • 17. Next Steps•  Complete separation of namespace and block management layers -  Block storage as generic service•  Partial namespace in memory for further scalability•  Move partial namespace from one namenode to another -  Namespace operation - no data copy Page 17
  • 18. Next Steps •  Namenode as a container for namespaces -  Lots of small namespace volumes … •  Chosen per user/tenant/data feed Namenodes •  Mount tables for unified namespace -  Can be managed by a central volume server Datanode …   Datanode -  Move namespace from one container to another for balancing •  Combined with partial namespace -  Choose number of namenodes to match •  Sum of (Namespace working set) •  Sum of (Namespace throughput) Page 18
  • 19. Thank YouMore information1.  HDFS-1052: HDFS Scalability with multiple namenodes2.  Hadoop – 7426: user guide for how to use viewfs with federation3.  An Introduction to HDFS Federation – https://hortonworks.com/an-introduction-to-hdfs-federation/ Page 19
  • 20. Other Resources• Next webinar: Improve Hive and HBase Integration –  May 2, 2012 @ 10am PST –  Register now :http://hortonworks.com/webinars/• Hadoop Summit – June 13-14 – San Jose, California – www.Hadoopsummit.org• Hadoop Training and Certification – Developing Solutions Using Apache Hadoop – Administering Apache Hadoop – http://hortonworks.com/training/ © Hortonworks Inc. 2012 Page 20
  • 21. Backup slides Page 21
  • 22. HDFS Federation Across Clusters / Application / Application mount-table mount-table in Cluster 2 in Cluster 1 home tmp home tmp data project data project Cluster 2 Cluster 1 Page 22