Scale-out NAS and File
System Benchmarking
Advanced Research Computing
Advanced Research Computing
Outline
• Scale-out NAS: beyond the buzzwords
– Parallel storage vs scale-out NAS
– GPFS
– NFS v4.1
• Entering the post-RAID era
• A look at two SoNAS products
– EMC Isilon X-series and S-series
– NetApp FAS 3200 series
2
Advanced Research Computing 3
SCALE OUT NAS: THE REAL STORY
Advanced Research Computing
• Distributed file system
–Storage system whose devices can be
accessed concurrently by many clients
–Namespace may be global
–Scalable aggregate throughput
• Distributed storage system is not the
same as parallel storage
Distributed Storage
Advanced Research Computing
• Parallel file system
–Storage system whose devices can be
accessed concurrently by many clients
even when all the clients access the
same file
–A file is striped across several storage
devices
–Global namespace
– Scalable throughput at file level
Parallel Storage
Advanced Research Computing
• Directly Attached Storage
– All cluster nodes have direct access to the
storage devices
• All cluster nodes are storage servers
– Storage fabric can be separate from the
cluster interconnect fabric or common
– Can scale independently the storage devices
and the file servers
Storage Topologies 1
Advanced Research Computing
Storage
Device
File System Server
Storage Interconnect Fabric 10GE, FCoE, InfiniBand
Storage
Device
Storage
Device
Storage
Device
Cluster Interconnect Fabric 10GE, InfiniBand
Cluster node
RAID Controller RAID Controller
File System Server
Cluster node
Directly Attached Storage
Advanced Research Computing
• Network Attached Storage
– Only some cluster nodes have direct access to
storage media
– So cluster nodes are of two types
• clients
• servers
– Scaling of storage servers and storage devices
happens simultaneously
Storage Topologies 2
Advanced Research Computing
Storage
Device
File System Server
Storage
Device
RAID Controller
Storage
Device
File System Server
Storage
Device
Interconnect Fabric (10GE, InfiniBand)
Client node Client node
RAID Controller
Network Attached Storage
Advanced Research Computing
• GPFS supports multiple topologies
– directly attached,
– network attached, and
– hybrid topologies
• Reliability is achieved via
– hardware: redundant controllers, RAID, active-
active servers
– software: explicit data replication
– combination of hardware and software
IBM GPFS
Advanced Research Computing
Storage
Device
Storage Interconnect Fabric (10GE, FCoE, InfiniBand)
Storage
Device
Storage
Device
Storage
Device
Cluster Interconnect Fabric (10GE, InfiniBand)
RAID Controller RAID Controller
GPFS NSD Server
Cluster node
GPFS NSD Server
Cluster node
GPFS Directly Attached Storage
Advanced Research Computing
Storage
Device
Storage Node
NSD Server
Storage
Device
Interconnect Fabric 10GE, InfiniBand
Cluster node
NSD client
Cluster node
NSD client
GPFS Network Attached Storage (1)
RAID Controller
Storage
Device
Storage Node
NSD Server
Storage
Device
RAID Controller
Advanced Research Computing
Server Interconnect Fabric 10GE, InfiniBand
Cluster node
NSD client
Cluster node
NSD client
Storage
Device
Storage Interconnect Fabric (10GE, FCoE, InfiniBand)
Storage
Device
Storage
Device
Storage
Device
RAID Controller RAID Controller
GPFS NSD ServerGPFS NSD Server
GPFS Network Attached Storage (2)
Advanced Research Computing
Parallel Access Protocols
• NFS is a head-node NAS architecture
where the NFS server is the bottleneck
– in-band access: control path the same as the
data path
• pNFS allows clients to access storage
directly and in parallel
– Separation of data and metadata
– Direct access to the data servers
– Out-of-band metadata access
14
Advanced Research Computing
pNFS architecture
NFS Data Server
pNFS client
NFS MDS NFS Data Server
Advanced Research Computing
pNFS Parallel Access
• pNFS exploits the parallelism of parallel file
systems like GPFS
– pNFS gets file layout and access protocols from
metadata server
• Access protocols:
– File-based, NFS v 4.1
– Block-based and object-based
• Client accesses the data servers using one of the
supported access protocols
– To have parallel access to one file by a client, block-
based protocol is needed
16
Advanced Research Computing
• Scale-out NAS (SoNAS)
– Exposes a parallel file system to remote
clients
– File servers typically not on the LAN of the
compute nodes (clients)
– SoNAS = Parallel File System +
Parallel Access Protocol
From Parallel File Systems to SoNAS
Advanced Research Computing
• Scale-out NAS (SoNAS) means
– Parallel File System + Parallel Access Protocol
• Some vendors call SoNAS systems that are
missing either
– A parallel file system
• have just a global namespace for distributed storage
– A parallel access protocol:
• have just multiple ports for multiple clients, but one
client uses one port (or bonded ports)
Scale-out NAS (SoNAS)
Advanced Research Computing
pNFS over GPFS
NFS Data Server
pNFS client
NSD Server
NFS MDS
NSD Server
NFS Data Server
NSD server
Advanced Research Computing
Storage
Device
Storage
Device
Compute node
pNFS
Compute node
pNFS
LAN/WAN Interconnect
Storage
Device
Storage
Device
NSD Server
NFS Data Server
Storage
Device
Storage
Device
NSD Server
NFS Data Server
NSD Server
NFS MDS
SoNAS: pNFS over GPFS
Advanced Research Computing 21
THE POST-RAID ERA
Advanced Research Computing
• While hard disk capacity has kept
increasing, disk bandwidth has stayed
almost constant
– Per-disk throughput is about 80 MB/sec
• Reconstructing a 3 TB disk will take at
least 11 hours (in practice, several times
more due to concurrent workload)
• RAID is becoming a bottleneck for tera-
byte disk drives
Entering the post-RAID era
Advanced Research Computing
• Error correcting codes are added as part
of file striping
– File-level error correction, rather than block level
• Erasure code: transform a k-symbol word
into an n symbol code, n > k, such that
any k of the n symbols are sufficient to
recover the word
• Data from a failed disk is reconstructed
into currently available space, which can
be spread over multiple disks
Distributed erasure codes (1)
Advanced Research Computing
• Distributed erasure codes are
gradually replacing RAID for cluster
storage and cloud storage
• Isilon is a pioneer in using erasure
coding instead of RAID
Distributed erasure codes (2)
Advanced Research Computing 25
EMC ISILON S AND X SERIES
Advanced Research Computing
Isilon S series and X series (1)
• Isilon S200, X200/X400 are storage clusters
– A cluster is composed of storage appliances
containing CPUs, memory, NVRAM,
InfiniBand HBA, and storage media
• The native file system uses file striping for
performance and erasure coding for
reliability
• With pNFS scalable performance is possible
• S and X series are real SoNAS clusters
Advanced Research Computing
Storage
Device
NFS client NFS client
LAN/WAN Interconnect
Storage
Device
OneFS
NFS v4
Storage
Device
OneFS
NFS v4
OneFS
NFS v4
Isilon S series and X series (2)
InfiniBand Fabric
Advanced Research Computing
Attribute S 200 X 200 X 400
Number of storage nodes 3 to 144
Disks or SSDs per node 24 6-12 32-36
HDD specs 2.5’’ SAS 10K RPM 3.5’’ SATA 7.2K RPM
Networking 2x 1 GbE, 2x 10 GbE, 2x InfiniBand
Globally coherent cache 13.8 TB 6.9 TB 27.6 TB
Automatic data balancing yes
Processor and memory 2x 4 core Xeon, 96/48/192 GB DRAM, 512 MB NVRAM
Storage protocols NFS v3 and v4, CIFS, HDFS
Performance 1.1 mill. NFS IOPS
100 GBps
100 GBps 100 GBps
Maximum capacity 2.1 PB/cluster 5.2 PB/cluster 15.5 PB/cluster
EMC Isilon S series and X series
Advanced Research Computing
Isilon OneFS (1)
• The S and X series run the OneFS
operating system which includes
– FlexProtect: generate erasure codes
• OneFS supports NFS v3 and v4
– Parallel transfers are possible if we install
NFS v4 client on the clusters
– NFS v3 supports Kerberos
Advanced Research Computing 30
NETAPP FAS 3200 SERIES
Advanced Research Computing
• FAS 3200 is a distributed storage system
– A controller box can accommodate a maximum
number of HDDs
– The controller is the NAS head
• It is a NAS-head architecture with multiple
heads to mitigate the bottleneck
– Not a real SoNAS
– In cluster-mode supports file-based only pNFS
• Write Anywhere File Layout (WAFL) provides
quick file growth and zero-copy snapshots
NetApp FAS 3200 Series
Advanced Research Computing
NFS client
Data OnTAP
NFS server
NetApp FAS 3200: kind of SoNAS
Storage Device
RAID Controller
Data OnTAP
NFS server
RAID Controller
Storage Device
Advanced Research Computing
Data ONTAP and pNFS
Advanced Research Computing
Attribute FAS 3240
Processor cores 8
Memory 16 GB
Maximum Flash cache 1 TB
On-board IO 4x 6 Gb SAS
4x 4 Gb FC
External networking 4x GbE
Internal networking (HA) 2 X 10 GbE
Maximum number of disks 600
Maximum capacity 1.2 PB
Storage protcols NFS, CIFS, iSCSI, FTP
NetApp FAS 3240 specs
Advanced Research Computing
Data Protection OnTAP vs OneFS
Advanced Research Computing
Thank you.
Questions?

SoNAS

  • 1.
    Scale-out NAS andFile System Benchmarking Advanced Research Computing
  • 2.
    Advanced Research Computing Outline •Scale-out NAS: beyond the buzzwords – Parallel storage vs scale-out NAS – GPFS – NFS v4.1 • Entering the post-RAID era • A look at two SoNAS products – EMC Isilon X-series and S-series – NetApp FAS 3200 series 2
  • 3.
    Advanced Research Computing3 SCALE OUT NAS: THE REAL STORY
  • 4.
    Advanced Research Computing •Distributed file system –Storage system whose devices can be accessed concurrently by many clients –Namespace may be global –Scalable aggregate throughput • Distributed storage system is not the same as parallel storage Distributed Storage
  • 5.
    Advanced Research Computing •Parallel file system –Storage system whose devices can be accessed concurrently by many clients even when all the clients access the same file –A file is striped across several storage devices –Global namespace – Scalable throughput at file level Parallel Storage
  • 6.
    Advanced Research Computing •Directly Attached Storage – All cluster nodes have direct access to the storage devices • All cluster nodes are storage servers – Storage fabric can be separate from the cluster interconnect fabric or common – Can scale independently the storage devices and the file servers Storage Topologies 1
  • 7.
    Advanced Research Computing Storage Device FileSystem Server Storage Interconnect Fabric 10GE, FCoE, InfiniBand Storage Device Storage Device Storage Device Cluster Interconnect Fabric 10GE, InfiniBand Cluster node RAID Controller RAID Controller File System Server Cluster node Directly Attached Storage
  • 8.
    Advanced Research Computing •Network Attached Storage – Only some cluster nodes have direct access to storage media – So cluster nodes are of two types • clients • servers – Scaling of storage servers and storage devices happens simultaneously Storage Topologies 2
  • 9.
    Advanced Research Computing Storage Device FileSystem Server Storage Device RAID Controller Storage Device File System Server Storage Device Interconnect Fabric (10GE, InfiniBand) Client node Client node RAID Controller Network Attached Storage
  • 10.
    Advanced Research Computing •GPFS supports multiple topologies – directly attached, – network attached, and – hybrid topologies • Reliability is achieved via – hardware: redundant controllers, RAID, active- active servers – software: explicit data replication – combination of hardware and software IBM GPFS
  • 11.
    Advanced Research Computing Storage Device StorageInterconnect Fabric (10GE, FCoE, InfiniBand) Storage Device Storage Device Storage Device Cluster Interconnect Fabric (10GE, InfiniBand) RAID Controller RAID Controller GPFS NSD Server Cluster node GPFS NSD Server Cluster node GPFS Directly Attached Storage
  • 12.
    Advanced Research Computing Storage Device StorageNode NSD Server Storage Device Interconnect Fabric 10GE, InfiniBand Cluster node NSD client Cluster node NSD client GPFS Network Attached Storage (1) RAID Controller Storage Device Storage Node NSD Server Storage Device RAID Controller
  • 13.
    Advanced Research Computing ServerInterconnect Fabric 10GE, InfiniBand Cluster node NSD client Cluster node NSD client Storage Device Storage Interconnect Fabric (10GE, FCoE, InfiniBand) Storage Device Storage Device Storage Device RAID Controller RAID Controller GPFS NSD ServerGPFS NSD Server GPFS Network Attached Storage (2)
  • 14.
    Advanced Research Computing ParallelAccess Protocols • NFS is a head-node NAS architecture where the NFS server is the bottleneck – in-band access: control path the same as the data path • pNFS allows clients to access storage directly and in parallel – Separation of data and metadata – Direct access to the data servers – Out-of-band metadata access 14
  • 15.
    Advanced Research Computing pNFSarchitecture NFS Data Server pNFS client NFS MDS NFS Data Server
  • 16.
    Advanced Research Computing pNFSParallel Access • pNFS exploits the parallelism of parallel file systems like GPFS – pNFS gets file layout and access protocols from metadata server • Access protocols: – File-based, NFS v 4.1 – Block-based and object-based • Client accesses the data servers using one of the supported access protocols – To have parallel access to one file by a client, block- based protocol is needed 16
  • 17.
    Advanced Research Computing •Scale-out NAS (SoNAS) – Exposes a parallel file system to remote clients – File servers typically not on the LAN of the compute nodes (clients) – SoNAS = Parallel File System + Parallel Access Protocol From Parallel File Systems to SoNAS
  • 18.
    Advanced Research Computing •Scale-out NAS (SoNAS) means – Parallel File System + Parallel Access Protocol • Some vendors call SoNAS systems that are missing either – A parallel file system • have just a global namespace for distributed storage – A parallel access protocol: • have just multiple ports for multiple clients, but one client uses one port (or bonded ports) Scale-out NAS (SoNAS)
  • 19.
    Advanced Research Computing pNFSover GPFS NFS Data Server pNFS client NSD Server NFS MDS NSD Server NFS Data Server NSD server
  • 20.
    Advanced Research Computing Storage Device Storage Device Computenode pNFS Compute node pNFS LAN/WAN Interconnect Storage Device Storage Device NSD Server NFS Data Server Storage Device Storage Device NSD Server NFS Data Server NSD Server NFS MDS SoNAS: pNFS over GPFS
  • 21.
    Advanced Research Computing21 THE POST-RAID ERA
  • 22.
    Advanced Research Computing •While hard disk capacity has kept increasing, disk bandwidth has stayed almost constant – Per-disk throughput is about 80 MB/sec • Reconstructing a 3 TB disk will take at least 11 hours (in practice, several times more due to concurrent workload) • RAID is becoming a bottleneck for tera- byte disk drives Entering the post-RAID era
  • 23.
    Advanced Research Computing •Error correcting codes are added as part of file striping – File-level error correction, rather than block level • Erasure code: transform a k-symbol word into an n symbol code, n > k, such that any k of the n symbols are sufficient to recover the word • Data from a failed disk is reconstructed into currently available space, which can be spread over multiple disks Distributed erasure codes (1)
  • 24.
    Advanced Research Computing •Distributed erasure codes are gradually replacing RAID for cluster storage and cloud storage • Isilon is a pioneer in using erasure coding instead of RAID Distributed erasure codes (2)
  • 25.
    Advanced Research Computing25 EMC ISILON S AND X SERIES
  • 26.
    Advanced Research Computing IsilonS series and X series (1) • Isilon S200, X200/X400 are storage clusters – A cluster is composed of storage appliances containing CPUs, memory, NVRAM, InfiniBand HBA, and storage media • The native file system uses file striping for performance and erasure coding for reliability • With pNFS scalable performance is possible • S and X series are real SoNAS clusters
  • 27.
    Advanced Research Computing Storage Device NFSclient NFS client LAN/WAN Interconnect Storage Device OneFS NFS v4 Storage Device OneFS NFS v4 OneFS NFS v4 Isilon S series and X series (2) InfiniBand Fabric
  • 28.
    Advanced Research Computing AttributeS 200 X 200 X 400 Number of storage nodes 3 to 144 Disks or SSDs per node 24 6-12 32-36 HDD specs 2.5’’ SAS 10K RPM 3.5’’ SATA 7.2K RPM Networking 2x 1 GbE, 2x 10 GbE, 2x InfiniBand Globally coherent cache 13.8 TB 6.9 TB 27.6 TB Automatic data balancing yes Processor and memory 2x 4 core Xeon, 96/48/192 GB DRAM, 512 MB NVRAM Storage protocols NFS v3 and v4, CIFS, HDFS Performance 1.1 mill. NFS IOPS 100 GBps 100 GBps 100 GBps Maximum capacity 2.1 PB/cluster 5.2 PB/cluster 15.5 PB/cluster EMC Isilon S series and X series
  • 29.
    Advanced Research Computing IsilonOneFS (1) • The S and X series run the OneFS operating system which includes – FlexProtect: generate erasure codes • OneFS supports NFS v3 and v4 – Parallel transfers are possible if we install NFS v4 client on the clusters – NFS v3 supports Kerberos
  • 30.
    Advanced Research Computing30 NETAPP FAS 3200 SERIES
  • 31.
    Advanced Research Computing •FAS 3200 is a distributed storage system – A controller box can accommodate a maximum number of HDDs – The controller is the NAS head • It is a NAS-head architecture with multiple heads to mitigate the bottleneck – Not a real SoNAS – In cluster-mode supports file-based only pNFS • Write Anywhere File Layout (WAFL) provides quick file growth and zero-copy snapshots NetApp FAS 3200 Series
  • 32.
    Advanced Research Computing NFSclient Data OnTAP NFS server NetApp FAS 3200: kind of SoNAS Storage Device RAID Controller Data OnTAP NFS server RAID Controller Storage Device
  • 33.
  • 34.
    Advanced Research Computing AttributeFAS 3240 Processor cores 8 Memory 16 GB Maximum Flash cache 1 TB On-board IO 4x 6 Gb SAS 4x 4 Gb FC External networking 4x GbE Internal networking (HA) 2 X 10 GbE Maximum number of disks 600 Maximum capacity 1.2 PB Storage protcols NFS, CIFS, iSCSI, FTP NetApp FAS 3240 specs
  • 35.
    Advanced Research Computing DataProtection OnTAP vs OneFS
  • 36.