We've updated our privacy policy. Click here to review the details. Tap here to review the details.
Activate your 30 day free trial to unlock unlimited reading.
Activate your 30 day free trial to continue reading.
Download to read offline
Hadoop Distributed File System (HDFS) evolves from a MapReduce-centric storage system to a generic, cost-effective storage infrastructure where HDFS stores all data of inside the organizations. The new use case presents a new sets of challenges to the original HDFS architecture. One challenge is to scale the storage management of HDFS - the centralized scheme within NameNode becomes a main bottleneck which limits the total number of files stored. Although a typical large HDFS cluster is able to store several hundred petabytes of data, it is inefficient to handle large amounts of small files under the current architecture.
In this talk, we introduce our new design and in-progress work that re-architects HDFS to attack this limitation. The storage management is enhanced to a distributed scheme. A new concept of storage container is introduced for storing objects. HDFS blocks are stored and managed as objects in the storage containers instead of being tracked only by NameNode. Storage containers are replicated across DataNodes using a newly-developed high-throughput protocol based on the Raft consensus algorithm. Our current prototype shows that under the new architecture the storage management of HDFS scales 10x better, demonstrating that HDFS is capable of storing billions of files.
Hadoop Distributed File System (HDFS) evolves from a MapReduce-centric storage system to a generic, cost-effective storage infrastructure where HDFS stores all data of inside the organizations. The new use case presents a new sets of challenges to the original HDFS architecture. One challenge is to scale the storage management of HDFS - the centralized scheme within NameNode becomes a main bottleneck which limits the total number of files stored. Although a typical large HDFS cluster is able to store several hundred petabytes of data, it is inefficient to handle large amounts of small files under the current architecture.
In this talk, we introduce our new design and in-progress work that re-architects HDFS to attack this limitation. The storage management is enhanced to a distributed scheme. A new concept of storage container is introduced for storing objects. HDFS blocks are stored and managed as objects in the storage containers instead of being tracked only by NameNode. Storage containers are replicated across DataNodes using a newly-developed high-throughput protocol based on the Raft consensus algorithm. Our current prototype shows that under the new architecture the storage management of HDFS scales 10x better, demonstrating that HDFS is capable of storing billions of files.
You just clipped your first slide!
Clipping is a handy way to collect important slides you want to go back to later. Now customize the name of a clipboard to store your clips.The SlideShare family just got bigger. Enjoy access to millions of ebooks, audiobooks, magazines, and more from Scribd.
Cancel anytime.Unlimited Reading
Learn faster and smarter from top experts
Unlimited Downloading
Download to take your learnings offline and on the go
You also get free access to Scribd!
Instant access to millions of ebooks, audiobooks, magazines, podcasts and more.
Read and listen offline with any device.
Free access to premium services like Tuneln, Mubi and more.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.
Thank you!