“ From a client’s view, the Petal storage system looks like a collection of highly-available virtual disks that are accessed over a network. Clients create virtual disks as needed and each virtual disk provides each client access to potentially all the physical capacity and performance of the entire petal storage system. If additional performance or capacity is needed, you simply plug in additional storage components and they are automatically and transparently integrated into the system. Clients simply notice that performance improved.
Petal is a good base for building other distributed services.
delays allocation of physical storage from creation time to usage time must periodically monitor and augment physical resource pools
Paxos assumes servers fail by ceasing to operate; they do not generate incorrect output that may confuse other servers or clients. Paxos assumes messages can be arbitrarily lost, delayed or reordered. Paxos guarantees consistency in the face of any combination of server and communication failures. Paxos guarantees progress as long as a majority of servers is up and can communicate with each other in a timely manner.
Only the failure of two neighboring servers will make data inaccessible. Dynamic load balancing of reads.
Reconfigure a 1GB volume while servicing requests. 1/2 threads issue 8KB reads, other 1/2 issue 8KB writes. Entire process takes about 5 1/2 minutes. Performance gradually increases to new higher level. Process can be applied in reverse. System is initially quiesced at the start. It may be possible to avoid this. Initial performance drop due to extra work in configuring servers and because a fraction of the requests are being forwarded. This drop can be made arbitrarily small. Small valleys due to phases in the reconfiguration.
ECE 6160: Advanced Computer Networks SAN Instructor: Dr. Xubin (Ben) He Email: [email_address] Tel: 931-372-3462 Course web: http://www.ece.tntech.edu/hexb/616f05
Petal/Frangipani Untrusted OS-agnostic FS semantics Sharing/coordination Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing Chained declustering Snapshots Does not control sharing Each “cloud” may resize or reconfigure independently. What indirection is required to make this happen, and where is it? Petal Frangipani NFS
The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at www.thekkath.org .
For ECE6160, several issues are important:
Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).
Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.
Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.
Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 04/14/10
Logical System View AdvFS NT FS PC FS UFS Scalable Network Petal /dev/vdisk1 /dev/vdisk2 /dev/vdisk3 /dev/vdisk4 /dev/vdisk5
Physical System View Scalable Network Parallel Database or Cluster File System /dev/shared1 Petal Server Petal Server Petal Server Petal Server
Virtual to Physical Translation PMap0 vdiskID offset (disk, diskOffset) PMap1 Virtual Disk Directory GMap PMap2 PMap3 Server 0 Server 1 Server 2 Server 3 (server, disk, diskOffset) (vdiskID, offset)