Your SlideShare is downloading. ×
Lustre+ZFS:Reliable/Scalable Storage
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Lustre+ZFS:Reliable/Scalable Storage

1,728
views

Published on


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,728
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Lustre+ZFS: Reliable/Scalable StorageJosh Judd, CTO © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 2. ZFS+Lustre: Open Storage Layers • ZFS: – Volume management layer (RAID) – Reliable storage (checksums) – Feature rich (snap, compression, replication, etc.) – Accelerators (SSD hybrid) – Scalable (E.g.16 exabytes for one file) • Lustre: – Linux-centric scale-out filesystem – Powers the world’s largest super computers – Single FS can be 10s of PB with TBs/sec of throughput – Can sit on top of ZFSPage 2 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 3. What does ZFS do for me? • HPC relevant features: – Support for 1/10GbE, 4/8/16GbFC, and 40Gb Infiniband – Multi-layer cache combines DRAM and SSDs with HDDs – Copy-on-write eliminates holes and accelerates writes – Checksums eliminate silent data corruption and bit rot – Snap, thin provisioning, compression, de-dupe, etc. built in – Lustre and SNFS integration allows 40GbE networking • Same software/hardware supports NAS and RAID – One management code base to control all storage platforms • Open storage. You can have the source code.Page 3 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 4. ZFS Feature Focus • Enhanced data integrity: – Legacy RAID is subject to silent corruption and holes – Little known fact: Virtually all RAIDs do not check parity on each read – so if a bit flipped, you just get the wrong data! – ZFS adds a checksum to every block to solve this • Advanced cache: – Legacy RAID has insufficient cache to be meaningful – ZFS-based WARPraid supports 100s of GB of DRAM, plus 10s of TBs of SSD – E.g., you can “push” directories onto SSD cache ahead of time to drastically accelerate read-intensive workloadsPage 4 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 5. What does Lustre do for me? • Category: Distributed (parallel-ish) filesystem – Similar(-ish) to StorNext, GPFS, pNFS, etc. • Allows performance and capacity to scale independently • One FS can be 10s of PB with TBs of throughput • No theoretical upper limit on architectural scalability – Yes, there are practical limits... – But those expand every year, and... – Even a “practical” Lustre FS can be 10s or 1000s of times larger than other FSs on the marketPage 5 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 6. How do they combine? • ZFS is currently supportable on Solaris. Lustre is currently supportable on Linux. So... How do they mix? • In theory, any of three ways: – Port Lustre to Solaris – has not been done – Port ZFS to Linux – in progress – replace MD/RAID and EXT4 – Use ZFS on Solaris as a RAID controller under Lustre on Linux • WARP Mechanics focuses on the third option – Is supportable in production now – Maintains full separation of code – Still allows ZFS to replace EXT4 down the road, while performing volume management on separate controllers – which aids performance and scalabilityPage 6 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 7. Example Architecture Lustre Clients QDR/FDR IB or 10/40Gbps ENet IB or ENet OSS 1a OSS 1b OSS 2a OSS 2b P2P RAID 1a RAID 1b RAID 2a RAID 2b ARC ARC ZIL L2ARC ZIL L2ARC ~250TB HDDs ~250TB HDDs ZFS RAID 1 ZFS RAID 2Page 7 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 8. Example Architecture Scale out example: 200 ZFS RAID systems = 400 controllers = 1.2TBytes/sec 16,000 NL-SAS HDDs = ~40PB usable Lustre Clients Infiniband or High-Speed Ethernet OSS1a OSS1b [ ... ] R1a R1b [ ... ] N-001 N-002 N-003 N-004 N-200 ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IOPage 8 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 9. Example Architecture Non-Lustre Clients IP (LAN/CAN/WAN) SMB SMB NFS NFS Lustre Clients Infiniband or High-Speed Ethernet [ ... ] [ ... ] ~200TB usable per Neutronium; estimated 3GB/sec per ctrl’er well-formed IOPage 9 © 2012 WARP Mechanics Ltd. All Rights Reserved.
  • 10. Practical WARP Implementation • The PetaPod Appliance:Page 10 © 2012 WARP Mechanics Ltd. All Rights Reserved.

×