HDFS deployments are growing in size, but their scalability is limited by the NameNodes (metadata overhead, file blocks, the number of Datanode heartbeats, and the increasing HDFS RPC workload). A common solution is to split the filesystem into multiple smaller subclusters. A challenge with this approach is how to maintain the splits of the subclusters (e.g., namespace partition), avoid forcing users to connect to multiple subclusters, and manage the allocation of directories/files themselves.
To solve this limitation, we have developed HDFS router-based federation, which horizontally scales out HDFS by building a federation layer for subclusters. It provides a federated view of multiple HDFS namespaces, and offers the same RPC and WebHDFS endpoints as Namenodes.
This approach is similar to existing ViewFS and HDFS federation functionality, except the mount table is managed on the service side by the routing layer rather than on client sides. This simplifies access to a federated cluster for existing HDFS clients and also clears the way for innovations inside HDFS subclusters (e.g., moving data among different subclusters for tiered purpose). This design follows the same design as YARN federation which provides near-linear scale-out by simply adding more subclusters. CHAO SUN, Software Engineer, Uber and INIGO GOIRI, Research Software Development Engineer, Microsoft