Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to build a reliable petabyte scale file system

378 views

Published on

During our first Infinit Meetup, Julien talked about how to build a reliable petabyte scale file system, what solutions already exists with their advantages and challenges.

See full talk on Youtube: https://youtu.be/zvF0FbpAALg

Published in: Software
  • Be the first to comment

  • Be the first to like this

How to build a reliable petabyte scale file system

  1. 1. How to build a reliable petabyte-scale storage system? http://infinit.sh Infinit (Paris) Meetup May 18th, 2016
  2. 2. There exist many storage systems from appliance-based vendors such as EMC, Dell, NetApp to software-based solutions such as NFS, AFS, Ceph (RedHat) etc. There also exist different types of storage systems, depending on the interface they provide: block, object or file. In the following we will focus on the underlying infrastructure, in particular the networking model that has a huge impact on the system’s scalability. The analysis is pretty clear: almost all existing storage systems tend to be centralized one way or another. This is also true of most distributed systems in general. the PROBLEMS
  3. 3. NFS A single server means that a single failure would lead to the unavailability and potential permanent lost of the data.
  4. 4. Ceph Decoupling metadata and data servers increases scalability as many data servers can be plugged. However, metadata servers still remain critical. Metadata Server 1 Metadata Server 2
  5. 5. the SOLUTION In my terms, we say that NFS has a centralized model while Ceph’s is distributed. At Infinit, we advocate decentralization as the way to achieve scalability and availability by removing any bottlenecks and single point of failure. Such peer-to-peer systems often rely on two key layers to handle this decentralization.
  6. 6. Overlay Network The overlay network layer connects nodes together and provides a lookup mechanism to find the node responsible for an identifier. The scalability of this layer is crucial and often relies on structure (key-based routing) to achieve efficient lookup without having complete knowledge of the nodes composing the network.
  7. 7. Example: The Chord overlay network
  8. 8. Distributed Hash Table The distributed hash table (DHT) construct relies on the overlay network to provide a put/get interface to reliably store information, known as objects (think Amazon S3). This layer is responsible for ensuring redundancy (e.g replication, erasure coding etc.), consistency (Paxos, quorums etc.), mutability and more.
  9. 9. Example: The DHash distributed hash table based on the Chrod overlay network
  10. 10. the APPLICATIONS On top of such a scalable DHT construct, many applications can be developed from large-scale object stores, chat, video streaming and file systems. A few companies have relied on peer-to-peer technologies in the past. Among those, Skype (VoIP), Joost (video streaming), Spotify (music streaming), BitTorrent (file sharing) but also Wuala and Tahoe-LAFS (file storage). A new wave is coming with Storj (object storage), IPFS (content- addressable hypermedia protocol) and Infinit (object/file storage).
  11. 11. the CHALLENGES Most peer-to-peer systems however have the particularity of relying on a worldwide community which implies more challenges depending on their goal: Untrustworthiness Assuming the nodes composing the peer- to-peer network are not all under the control of a single entity, those entities must trust each other. Alternatively, the system would need to take into account the potential malicious behavior of a portion of the nodes, problem commonly referred to as Byzantine fault tolerance. Efficiency The network latency can have a drastic impact in terms of performance depending on the location of the nodes. A way to increase caching is to distinguish mutable blocks from immutable blocks through the use of content hashing (and Merkle trees) that simplifies block validation and caching. Access Control While distributed systems handle access control by performing requests to a specific and centralized server, peer-to- peer decentralized systems must provide security and access control through cryptographic mechanisms. Signature, encryption (symmetric and asymmetric) and hashing compose the core cryptographic mechanisms relied upon to provide access control without a central entity.
  12. 12. We believe developers should consider these constructs because they offer many advantages and open many doors. The well-known blockchain mechanism for instance relies on a decentralized set of machines to achieve consensus and record the evolutions in a distributed ledger. Distributed hash tables (DHT) rely on the same consensus protocols, the same hashing mechanisms to ensure integrity and the same signature cryptosystems to ensure non-repudiation. the CONCLUSION

×