GPFS: General Parallel File
System
Why is it needed?
What is GPFS and its features?
Where it is being used?
Why GPFS is needed?
Growth Rate of Components
• ✓ CPU speed performance has increased 8
to 10 times.
• ✓ DRAM speed performance has increased 7
to 9 times.
• ✓ Network speed performance has increased
100 times.
• ✓ Bus speed performance has increased 20
times.
• ✓ But Hard disk drive (HDD) speed
performance has increased only 1.2 times.
Three Important Functions
of Enterprise Storage
• ✓ Store data
• ✓ Protect data from being lost
• ✓ Feed data to the computer’s processors
(so they can keep doing work)
Existing Solutions Inability
• DAS, NAS, SAN [alone]
• Many data centers have become victims of
“filer-sprawl”
• Data administration and management
(such as migration, backups, archiving)
costs to skyrocket!
• I/O performance & application workflow
What is GPFS
• The General Parallel File System (GPFS) is a high
performance clustered file system. It can be
deployed in shared disk or shared nothing
distributed parallel modes.
• Developer(s): IBM
• Operating system: AIX / Linux / Windows Server
• License: Proprietary
• System Introduced: 1998 (AIX)
• Max. volume size: 8 YB
• Max. file size: 8 EB
• Max. number of files: 264 per file system
• File system permissions: POSIX
GPFS Current Usage
• It is used by many of the world's largest commercial
companies, as well as some of the supercomputers on
the Top 500 List.
• For example, GPFS was the filesystem of the ASC
Purple Supercomputer which was composed of more
than 12,000 processors and 2 petabytes of total disk
storage spanning more than 11,000 disks.
• IBM,s GPFS is extensively used across multiple
industries like Government, Oil and Gas, Life Sciences,
Media/Entertainment, Financial services
GPFS Features
Standard file system interface with POSIX semantics
– Metadata on shared storage
– Distributed locking for read/write semantics
• Highly scalable
– High capacity (up to 2^99 bytes file system size, up to 2^63 files per file
system)
– High throughput (TB/s)
– Wide striping
– Large block size (up to 16MB)
– Multiple nodes write in parallel
• Advanced data management
– ILM (storage pools), Snapshots
– Backup HSM (DMAPI)
– Remote replication, WAN caching
• High availability
– Fault tolerance (node, disk failures)
– On-line system management (add/remove nodes, disks, ...)
References
• GPFS official homepage
• GPFS resources (including download)
• GPFS at Almaden
• GPFS Mailing List
• GPFS User Group
• IBM GPFS Product Documentation
• IBM GPFS Wiki

IBM GPFS

  • 1.
    GPFS: General ParallelFile System Why is it needed? What is GPFS and its features? Where it is being used?
  • 2.
    Why GPFS isneeded?
  • 3.
    Growth Rate ofComponents • ✓ CPU speed performance has increased 8 to 10 times. • ✓ DRAM speed performance has increased 7 to 9 times. • ✓ Network speed performance has increased 100 times. • ✓ Bus speed performance has increased 20 times. • ✓ But Hard disk drive (HDD) speed performance has increased only 1.2 times.
  • 4.
    Three Important Functions ofEnterprise Storage • ✓ Store data • ✓ Protect data from being lost • ✓ Feed data to the computer’s processors (so they can keep doing work)
  • 5.
    Existing Solutions Inability •DAS, NAS, SAN [alone] • Many data centers have become victims of “filer-sprawl” • Data administration and management (such as migration, backups, archiving) costs to skyrocket! • I/O performance & application workflow
  • 6.
    What is GPFS •The General Parallel File System (GPFS) is a high performance clustered file system. It can be deployed in shared disk or shared nothing distributed parallel modes. • Developer(s): IBM • Operating system: AIX / Linux / Windows Server • License: Proprietary • System Introduced: 1998 (AIX) • Max. volume size: 8 YB • Max. file size: 8 EB • Max. number of files: 264 per file system • File system permissions: POSIX
  • 7.
    GPFS Current Usage •It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. • For example, GPFS was the filesystem of the ASC Purple Supercomputer which was composed of more than 12,000 processors and 2 petabytes of total disk storage spanning more than 11,000 disks. • IBM,s GPFS is extensively used across multiple industries like Government, Oil and Gas, Life Sciences, Media/Entertainment, Financial services
  • 8.
    GPFS Features Standard filesystem interface with POSIX semantics – Metadata on shared storage – Distributed locking for read/write semantics • Highly scalable – High capacity (up to 2^99 bytes file system size, up to 2^63 files per file system) – High throughput (TB/s) – Wide striping – Large block size (up to 16MB) – Multiple nodes write in parallel • Advanced data management – ILM (storage pools), Snapshots – Backup HSM (DMAPI) – Remote replication, WAN caching • High availability – Fault tolerance (node, disk failures) – On-line system management (add/remove nodes, disks, ...)
  • 9.
    References • GPFS officialhomepage • GPFS resources (including download) • GPFS at Almaden • GPFS Mailing List • GPFS User Group • IBM GPFS Product Documentation • IBM GPFS Wiki