Your SlideShare is downloading. ×
ECE 6160: Advanced Computer Networks
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

ECE 6160: Advanced Computer Networks


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • “ From a client’s view, the Petal storage system looks like a collection of highly-available virtual disks that are accessed over a network. Clients create virtual disks as needed and each virtual disk provides each client access to potentially all the physical capacity and performance of the entire petal storage system. If additional performance or capacity is needed, you simply plug in additional storage components and they are automatically and transparently integrated into the system. Clients simply notice that performance improved.
  • Petal is a good base for building other distributed services.
  • delays allocation of physical storage from creation time to usage time must periodically monitor and augment physical resource pools
  • Paxos assumes servers fail by ceasing to operate; they do not generate incorrect output that may confuse other servers or clients. Paxos assumes messages can be arbitrarily lost, delayed or reordered. Paxos guarantees consistency in the face of any combination of server and communication failures. Paxos guarantees progress as long as a majority of servers is up and can communicate with each other in a timely manner.
  • Only the failure of two neighboring servers will make data inaccessible. Dynamic load balancing of reads.
  • Reconfigure a 1GB volume while servicing requests. 1/2 threads issue 8KB reads, other 1/2 issue 8KB writes. Entire process takes about 5 1/2 minutes. Performance gradually increases to new higher level. Process can be applied in reverse. System is initially quiesced at the start. It may be possible to avoid this. Initial performance drop due to extra work in configuring servers and because a fraction of the requests are being forwarded. This drop can be made arbitrarily small. Small valleys due to phases in the reconfiguration.
  • Transcript

    • 1. ECE 6160: Advanced Computer Networks SAN Instructor: Dr. Xubin (Ben) He Email: [email_address] Tel: 931-372-3462 Course web:
    • 2. Prev…
      • Networked storage
      • NAS
    • 3. Storage Architectures
    • 4. Storage Area Networks
    • 5. SAN connection
      • FC:
        • FC-SAN
      • LAN (Ethernet)
        • IP-SAN
        • iSCSI
      • Other networks
        • Petal (ATM)
    • 6. Typical SAN
        • Backup solutions (tape sharing)
        • Disaster tolerance solutions (distance to remote location)
        • Reliable, maintainable, scalable infrastructure
    • 7. A real SAN.
    • 8. NAS and SAN shortcomings
      • SAN Shortcomings --Data to desktop --Sharing between NT and UNIX --Lack of standards for file access and locking
      • NAS Shortcomings --Shared tape resources --Number of drives --Distance to tapes/disks
          • NAS --Focuses on applications, users, and the files and data that they share
          • SAN --Focuses on disks, tapes, and a scalable, reliable infrastructure to connect them
          • NAS Plus SAN --The complete solution, from desktop to data center to storage device
    • 9. NAS plus SAN.
          • NAS Plus SAN --The complete solution, from desktop to data center to storage device
    • 10. Petal/Frangipani Petal Frangipani NFS “ SAN” “ NAS”
    • 11. Petal/Frangipani Untrusted OS-agnostic FS semantics Sharing/coordination Disk aggregation (“bricks”) Filesystem-agnostic Recovery and reconfiguration Load balancing Chained declustering Snapshots Does not control sharing Each “cloud” may resize or reconfigure independently. What indirection is required to make this happen, and where is it? Petal Frangipani NFS
    • 12. Remaining Slides
      • The following slides have been borrowed from the Petal and Frangipani presentations, which were available on the Web until Compaq SRC dissolved. This material is owned by Ed Lee, Chandu Thekkath, and the other authors of the work. The Frangipani material is still available through Chandu Thekkath’s site at .
      • For ECE6160, several issues are important:
        • Understand the role of each layer in the previous slides, and the strengths and limitations of each layer as a basis for innovating behind its interface (NAS/SAN).
        • Understand the concepts of virtual disks and a cluster file system embodied in Petal and Frangipani.
        • Understand how the features of Petal simplify the design of a scalable cluster file system (Frangipani) above it.
    • 13. Petal: Distributed Virtual Disks Systems Research Center Digital Equipment Corporation Edward K. Lee Chandramohan A. Thekkath 04/14/10
    • 14. Logical System View AdvFS NT FS PC FS UFS Scalable Network Petal /dev/vdisk1 /dev/vdisk2 /dev/vdisk3 /dev/vdisk4 /dev/vdisk5
    • 15. Physical System View Scalable Network Parallel Database or Cluster File System /dev/shared1 Petal Server Petal Server Petal Server Petal Server
    • 16. Virtual Disks
      • Each disk provides 2^64 byte address space.
      • Created and destroyed on demand.
      • Allocates disk storage on demand.
      • Snapshots via copy-on-write.
      • Online incremental reconfiguration.
    • 17. Virtual to Physical Translation PMap0 vdiskID offset (disk, diskOffset) PMap1 Virtual Disk Directory GMap PMap2 PMap3 Server 0 Server 1 Server 2 Server 3 (server, disk, diskOffset) (vdiskID, offset)
    • 18. Global State Management
      • Based on Leslie Lamport’s Paxos algorithm.
      • Global state is replicated across all servers.
      • Consistent in the face of server & network failures.
      • A majority is needed to update global state.
      • Any server can be added/removed in the presence of failed servers.
    • 19. Fault-Tolerant Global Operations
      • Create/Delete virtual disks.
      • Snapshot virtual disks.
      • Add/Remove servers.
      • Reconfigure virtual disks.
    • 20. Data Placement & Redundancy
      • Supports non-redundant and chained-declustered virtual disks.
      • Parity can be supported if desired.
      • Chained-declustering tolerates any single component failure.
      • Tolerates many common multiple failures.
      • Throughput scales linearly with additional servers.
      • Throughput degrades gracefully with failures.
    • 21. Chained Declustering D0 Server0 D3 D4 D7 D1 Server1 D0 D5 D4 D2 Server2 D1 D6 D5 D3 Server3 D2 D7 D6
    • 22. Chained Declustering D0 Server0 D3 D4 D7 Server1 D2 Server2 D1 D6 D5 D3 Server3 D2 D7 D6 D1 D0 D5 D4
    • 23. The Prototype
      • Digital ATM network.
        • 155 Mbit/s per link.
      • 8 AlphaStation Model 600.
        • 333 MHz Alpha running Digital Unix.
      • 72 RZ29 disks.
        • 4.3 GB, 3.5 inch, fast SCSI (10MB/s).
        • 9 ms avg. seek, 6 MB/s sustained transfer rate.
      • Unix kernel device driver.
      • User-level Petal servers.
    • 24. The Prototype src-ss1 src-ss2 src-ss8 petal1 petal2 petal8 /dev/vdisk1 /dev/vdisk1 /dev/vdisk1 /dev/vdisk1 ……… ……… Digital ATM Network (AN2)
    • 25. Throughput Scaling
    • 26. Virtual Disk Reconfiguration 6 servers 8 servers virtual disk w/ 1GB of allocated storage 8KB reads & writes
    • 27. Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation
    • 28. Why Not An Old File System on Petal?
      • Traditional file systems (e.g., UFS, AdvFS) cannot share a block device
      • The machine that runs the file system can become a bottleneck
    • 29. Frangipani
      • Behaves like a local file system
        • multiple machines cooperatively manage a Petal disk
        • users on any machine see a consistent view of data
      • Exhibits good performance, scaling, and load balancing
      • Easy to administer
    • 30. Ease of Administration
      • Frangipani machines are modular
        • can be added and deleted transparently
      • Common free space pool
        • users don’t have to be moved
      • Automatically recovers from crashes
      • Consistent backup without halting the system
    • 31. Components of Frangipani
      • File system core
        • implements the Digital Unix vnode interface
        • uses the Digital Unix Unified Buffer Cache
        • exploits Petal’s large virtual space
      • Locks with leases
      • Write-ahead redo log
    • 32. Locks
      • Multiple reader/single writer
      • Locks are moderately coarse-grained
        • protects entire file or directory
      • Dirty data is written to disk before lock is given to another machine
      • Each machine aggressively caches locks
        • uses lease timeouts for lock recovery
    • 33. Logging
      • Frangipani uses a write ahead redo log for metadata
        • log records are kept on Petal
      • Data is written to Petal
        • on sync, fsync, or every 30 seconds
        • on lock revocation or when the log wraps
      • Each machine has a separate log
        • reduces contention
        • independent recovery
    • 34. Recovery
      • Recovery is initiated by the lock service
      • Recovery can be carried out on any machine
        • log is distributed and available via Petal
    • 35. References
      • E. Lee and C. Thekkath, “Petal: Distributed Virtual Disks,” Proceedings of the international conference on Architectural support for programming languages and operating systems ( ASPLOS 1996)
      • P. Sarkar, S. Uttamchandani, and K. Voruganti, “Storage Over IP: When Does Hardware Support Help?” Proc. of 2 nd USENIX Conference on File And Storage Technologies (FAST’2003)
      • C. Thekkath, T. Mann, and E. Lee, “Frangipani: A scalable distributed file system,” Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP) , pp. 224-237, October 1997