Petal/Frangipani Untrusted OS-agnostic FS semantics Sharing/coordination Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing Chained declustering Snapshots Does not control sharing Each “cloud” may resize or reconfigure independently. What indirection is required to make this happen, and where is it? Petal Frangipani NFS
The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org .
For ECE6160, several issues are important:
Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).
Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.
Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.
Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 04/14/10
Logical System View AdvFS NT FS PC FS UFS Scalable Network Petal /dev/vdisk1 /dev/vdisk2 /dev/vdisk3 /dev/vdisk4 /dev/vdisk5
Physical System View Scalable Network Parallel Database or Cluster File System /dev/shared1 Petal Server Petal Server Petal Server Petal Server
Each disk provides 2^64 byte address space.
Created and destroyed on demand.
Allocates disk storage on demand.
Snapshots via copy-on-write.
Online incremental reconfiguration.
Virtual to Physical Translation PMap0 vdiskID offset (disk, diskOffset) PMap1 Virtual Disk Directory GMap PMap2 PMap3 Server 0 Server 1 Server 2 Server 3 (server, disk, diskOffset) (vdiskID, offset)
Global State Management
Based on Leslie Lamport’s Paxos algorithm.
Global state is replicated across all servers.
Consistent in the face of server & network failures.
A majority is needed to update global state.
Any server can be added/removed in the presence of failed servers.
Fault-Tolerant Global Operations
Create/Delete virtual disks.
Snapshot virtual disks.
Reconfigure virtual disks.
Data Placement & Redundancy
Supports non-redundant and chained-declustered virtual disks.
Parity can be supported if desired.
Chained-declustering tolerates any single component failure.
Tolerates many common multiple failures.
Throughput scales linearly with additional servers.