Hadoop Summit, April 2014
Just as the survival of living species depends on the transfer of essential knowledge within the community and between generations, the availability and reliability of a distributed computer system relies upon consistent replication of core metadata between its components. This presentation will highlight the implementation of a replication technique for the namespace of the Hadoop Distributed File System (HDFS). In HDFS, the namespace represented by the NameNode is decoupled from the data storage layer. While the data layer is conventionally replicated via block replication, the namespace remains a performance and availability bottleneck. Our replication technique relies on quorum-based consensus algorithms and provides an active-active model of high availability for HDFS where metadata requests (reads and writes) can be load-balanced between multiple instances of the NameNode. This session will also cover how the same techniques are extended to provide replication of metadata and data between geographically distributed data centers, providing global disaster recovery and continuous availability. Finally, we will review how consistent replication can be applied to advance other systems in the Apache Hadoop stack; e.g., how in HBase coordinated updates of regions selectively replicated on multiple RegionServers improve availability and overall cluster throughput.