SlideShare a Scribd company logo
1
Storage Infrastructure
for HPC
Gabriel Mateescu
mateescu@acm.org
Overview
• Data-intensive science
• Architecture of Parallel Storage
• Parallel File Systems
– GPFS, Lustre, PanFS
• Data Striping
• Scale-out NAS and pNFS
• IO acceleration
2
The 4th paradigm of science
• Experiment
• Theory: models
• Computational Science: simulations
• Data-intensive science
– Unifies theory, experiment, and simulation
– Digital information processed by software
– Capture, curation, and analysis of data
– Creates a data explosion effect
3
Data Explosion 1
• Explosion of data volume
– The amount of data doubles every two years
– Number of files grows faster
• Challenges:
– Disk bandwidth growth lags compute bandwidth
growth
– Data management: migration to appropriate
performance tier, replication, backup, compression
– Capacity provisioning
4
Data Explosion 2
• Turning data into actionable insights
requires solving all these challenges
– Enough storage capacity
– Data placement and migration
– Data transfer bandwidth
– Data discovery
• New technology needed to handle massive
data sizes and file counts
– Access, preservation and movement of data
requires high-performance, scalable storage 5
Early days of HPC storage
6
Compute
Node 0
File 0
Local file
system
• One file per compute node
• Hard to manage and data stage-in and stage-
out needed
Compute
Node 1
File 1
Local file
system
Compute
Node 2
File 2
Local file
system
Compute
Node 3
File 3
Local file
system
Parallel and shared storage
7
Compute
Node 0
File A
• All compute nodes can access all files
• Multiple compute nodes can access the
same file concurrently
Compute
Node 1
File B
Compute
Node 2
File C
Compute
Node 3
File D
Shared and Parallel File System
Parallel Storage
• Parallel storage system
– Aggregate a large number of storage devices to
provide a system whose devices can be
accessed concurrently by many clients
– Ideally, the throughput of the system is the sum
of the throughput of the storage devices
• Parallel file system
– Global namespace on top of the storage
system: all clients see the same filenames
– Global address space: all clients see the same
address space of a given file 8
Network Attached Storage
Storage
Device
File System Server
Storage
Device
RAID Controller
Storage
Device
File System Server
Storage
Device
Interconnect Fabric 10GE, InfiniBand
Client node Client node
RAID Controller
Directly Attached Storage
Storage
Device
File System Server
Storage Interconnect Fabric 10GE, FCoE, InfiniBand
Storage
Device
Storage
Device
Storage
Device
Cluster Interconnect Fabric 10GE, InfiniBand
Compute node
RAID Controller RAID Controller
File System Server
Compute node
Scale-out NAS (SoNAS)
Storage
Device
File System Server
Storage Interconnect Fabric 10GE, FCoE, InfiniBand
Storage
Device
Storage
Device
Storage
Device
WAN Interconnect
Client node
RAID Controller RAID Controller
File System Server
Client node
Parallel File System vs SoNAS
• Parallel file system
– Provides high throughput to one file by striping
the file across several storage devices
– Client nodes may also be file system servers
• Scale-out NAS (SoNAS)
– Parallel File System + Parallel Access Protocol
– File system servers typically not on the LAN of
the compute nodes (clients)
12
• A LUN is a logical volume made out of
multiple physical disks
• Typically, a LUN is built as a RAID array
– RAID offers redundancy and/or concurrency
• There are several RAID types
– RAID0: striping
– RAID6: striping and two parity blocks
• 8 data disks + 2 parity disks
• Parity disks are distributed across 10 disks
13
LUN and RAID
• RAID stripe: a sequence of blocks that
contains one block from each disk of a LUN
– Stripe width = number of disks per LUN
– Stripe depth = size per disk
– Stripe size = Stripe width ×Stripe depth
• File system stripe: a sequence of blocks
(segments) that contains one block from
each LUN
– Stripe width = number of LUNs
– Stripe depth, aka block size
14
Striping
Scaling
• Capacity scaling
– cores/node, memory/node, node count
– storage size, network switches
• Performance scaling
– GFlops, Instructions/cycle, Memory bandwidth
– IO throughput: large or small file, metadata
• IO scaling requires a balanced system
architecture to avoid bottlenecks
15
Scaling Bottlenecks
16
Storage wall
• As the system size (CPUs, memory,
interconnect, the number of compute nodes)
increases, providing scalable IO throughput
becomes very expensive
• Ken Batcher, recipient of the Seymour Cray
award put it this way:
– A supercomputer is a device for turning
compute-bound problems into IO-bound
problems
17
IBM GPFS (1)
18
• Global Parallel File System
– Supports both architectures
• network-attached: software or hardware RAID
• directly-attached
– Network Shared Disk
• Cluster-wide naming
• Access to data
– Full POSIX semantics
• Atomicity of concurrent read and write operations
GPFS Directly Attached Storage
Storage
Device
Storage Interconnect Fabric 10GE, FCoE, InfiniBand
Storage
Device
Storage
Device
Storage
Device
Cluster Interconnect Fabric 10GE, InfiniBand
RAID Controller RAID Controller
GPFS NSD Server
Compute node
GPFS client
GPFS NSD Server
Compute node
GPFS client
GPFS Network Attached Storage
Storage
Device
Storage Node
NSD Server
Storage
Device
Interconnect Fabric 10GE, InfiniBand
Compute node
GPFS client
NSD
Storage
Device
Storage Node
NSD Server
Storage
Device
Compute node
GPFS client
NSD
HA for Network Attached Storage
Storage
Array
Storage
Node
Storage
Array
Storage
Node
• If a storage node fails, the load on the other
storage node doubles
• Tolerates failure of one out of two nodes
Triad HA
Storage
Array
Storage
Node
Storage
Node
• If a storage node fails, the load on the other
two storage nodes grows with 50%
• Tolerates failure of two out of three nodes
Storage
Node
Storage
Array
Storage
Array
IBM GPFS 2
23
• Nodeset: group of nodes that operate on the
same file systems
• GPFS management servers
– cluster data server: one or two per cluster
• cluster configuration and file system information
– file system manager: one per file system
• disk space allocation, token management server
– configuration manager: one per nodeset
• Selects file system manager
– metanode: one per opened file
• handles updates to metadata
GPFS Scaling
• GPFS meta-nodes
– Each directory is assigned to a metanode that
manages it, e.g., locking
– Each file is assigned to a metanode that
manages it, e.g., locking
• The meta-node may become a bottleneck
– One file per task: puts pressure on the directory
meta-node for large jobs, unless a directory
hierarchy is created
– One shared file: puts pressure on the file meta-
node for large jobs 24
GPFS Striping (1)
• GPFS-level striping: spread the blocks of a
file across all LUNs
– Stripe width = number of LUNs
– GPFS block size = block stored in a LUN
• RAID-level striping
– Assume RAID6 with 8+2P, block-level striping
– Stripe width is 8 (8 + 2P)
– Stripe depth is the size of a block written to one
disk; a multiple of the sector size, e.g., 512 KiB
– Stripe size = Stripe depth × Stripe width = 8 ×
512 KiB = 4 MiB 25
GPFS Striping (2)
• GPFS block size
– equal to the RAID stripe size = 4 MiB
• Stripe width impacts aggregate bandwidth
– GPFS Stripe width equal to number of LUNs
maximizes throughout per file
– RAID Stripe Width of 8 (8+2P) for RAID6
balances performance and fault tolerance
• Applications should write blocks that are
– multiple of the GPFS block size and aligned
with the GPFS blocks
26
Impact of IO Block Size
27
IO size (Bytes)
Throughput (MB/sec) for a 1TB
SAS Seagate Barracuda ES2 disk
Handling Small Files
• Small files do not benefit from GPFS striping
• Techniques used for small files
– Read-ahead: pre-fetch next disk block on disk
– Write behind: buffer writes
• These are used by other parallel file
systems as well
– For example, Panasas PanFS
28
Lustre file system 1
• Has the network-attached architecture
• Object-based storage
– Uses storage objects instead of blocks
– Storage objects are units of storage that have
variable size, e.g., an entire data structure or
database table
– File layout gives the placement of objects rather
than blocks
• User can set stripe width and depth,
and the file layout 29
Lustre Architecture
Object
Storage
Target
Object Storage
Server (OSS)
Interconnect Fabric 10GE, InfiniBand
Lustre client Lustre client
Metadata Server
(MDS)
Object
Storage
Target
Object
Storage
Target
Object Storage
Server (OSS)
Object
Storage
Target
Metadata Target
Lustre file system 2
• Metadata servers (MDS)
– Manages file metadata and global namespace
• Object storage server (OSS)
– Is the software that fulfills requests from clients
and gets/stores data to one or more Object
Storage Targets (OSTs)
– An OST is a logical unit number, which can
consists of one or more disk drives (RAID)
• Management Server (MGS)
– can be co-located with MDS/MDT 31
Parallel NFS (pNFS)
• pNFS allows clients to access storage
directly and in parallel
– Separation of data and metadata
– Direct access to the data servers
– Out-of-band metadata access
• Storage access protocols:
– File: NFS v4.1
– Object: object-based storage devices (OSD)
– Block: iSCSI, FCoE
32
pNFS architecture
NFS Data Server
pNFS client
NFS MDS
pNFS client
NFS Data Server
pNFS over Lustre
NFS Data Server
pNFS client
Lustre MDS
pNFS client
NFS MDS
Lustre OSS
NFS Data Server
Lustre OSS
Panasas Solution (1)
• SoNAS based on
– PanFS: Panasas ActiveScale file system
– pNFS or DirectFlow: Parallel access protocol
• Architecture
– Director Blade: MDS and management
– Storage Blade: storage nodes
• Disk: 2 or 3 TB/disk, 75 MB/s, one or two disks
• SSD (some models): 32 GB SLC
• CPU + Cache
– Shelf = 1 director blade + 10 storage blades 35
pNFS over PanFS
NFS Data Server
pNFS client
Director Blade
pNFS client
NFS MDS
Storage Blade
NFS Data Server
Storage Blade
Panasas Solution (2)
• Feeds and Speeds
– Shelf: 10 storage blades + 1 directory blade
• Disk Size = 10 * 6 TB = 60 TB
• Disk Throughput: 10 * (2 * 75 MB/s) = 1.5 GB/s
– Rack: 10 shelves
• Size = 600 TB
• Throughput: 15 GB/s
– System: 10 racks
• Size = 6 PB
• Throughput: 150 GB/s
37
Data vs Computation
Movement
38
• Consider a Lustre cluster with
–100 compute nodes (CNs), each with 1
TB local storage, 80 MB/s per local disk
–10 OSS and 10 OSTs/OSS,
–1TB/OST, 80 MB/s per OST
–4x SDR InfiniBand network, has 8 Gbps
that is, 1 GB/s
Lustre cluster
Object
Storage
Target
Object Storage
Server (OSS)
InfiniBand Fabric
Lustre client
Metadata Server
(MDS)
Object
Storage
Target
Object
Storage
Target
Object Storage
Server (OSS)
Object
Storage
Target
Metadata Target
Compute node
Lustre client
Compute node
Local Disk Local Disk
1 GB/s
80 MB/s
MapReduce /Lustre
40
• Compute Nodes access data from
Lustre
• Disk throughput per OSS = 10 * 80
MB/s = 800 MB/s
–InfiniBand has 1 GB/s, so it can sustain
this throughput
• Aggregate disk throughput
–10 * 800 MB/s = 8 GB/s
MapReduce on Lustre vs
HDFS
41
• MapReduce/HDFS:
– Compute nodes use local disks
– Per compute-node throughput is 80 MB/s
– Aggregate disk throughput is 100 * 80 MB/s =
8 GB/s
• Aggregate throughput is the same, 8 GB/s
– The interconnect fabric provides enough
bandwidth for the disks
• MapReduce/Lustre is competitive with
MapReduce/HDFS for latency-tolerant work
Data & Compute Trends
• Compute power: 90% per year
• Data volume: 40-60% per year
• Disk capacity: 50% per year
• Disk bandwidth: 15% per year
• Balancing the compute and disk
throughput requires the number of
disks to grow faster than the number of
compute nodes
42
IO Acceleration
43
• Disk bandwidth does not keep up with
memory and network bandwidth
• Hide low disk bandwidth using fast
buffering of IO data
–IO forwarding
–SSDs
Data Staging
44
• Data staging
–IO forwarding or SSDs
• IO forwarding hides disk bandwidth by
–Buffering the IO generated by an
application on staging machine: free
memory on the supercomputer for
simulation
– Overlapping computation on the
supercomputer with IO on the staging
machine
45
• Consider a machine with 1 PB of RAM that
reaches the peak performance of 1 PFlop/sec
when the operational intensity is >= 2 Flop/B
• Consider an application with operational intensity
1 Flop/B that uses 1 PB of RAM, executes 600
PFlop/iteration, and dumps each iterate to disk
• Running the application on the above machine
takes a time per iteration
Tcomp = (600 PFlop/iteration )/(.1 PFlop/sec) = 6000 sec
Benefits of IO forwarding (1)
Benefits of IO forwarding (2)
46
• We can hide almost all the IO time if we can
– Copy 1PB to a staging machine in Tfwd << Tcomp
– Write the 1 PB from the staging machine to disk in
(Tcomp – Tfwd ) ~ Tcomp
• Assume the staging machine has 64 K nodes
each with a 4x QDR port (4 GB/sec per port); then
Throughput = 64 K * 4 GB/sec = 256 TB/sec
Tfwd = 1024 TB/ (64 K * 4 GB/sec) = 4 sec << Tcomp
• So the required disk bandwidth is
BW = (1 PB) / (6000 sec) = 166 GB/sec << 256 TB/sec
• Similar benefit for SSDs
SSD Metadata Store
• MDS is a bottleneck for metadata-
intensive operations
• Use SSD for the metadata store
• IBM GPFS with SSD for metadata store
– eight NSD servers with four 1.8 TB SSD and 1.25
GB/s, PCIe attached; two GPFS clients
– Processes the 6.5 TB of metadata for a file
system with 10 Billion files in 43 min
– Enable timely policy-driven data management
47
Conclusion
• Parallel storage has evolved similarly
to parallel computation
– Scale by adding disk drives, networking, CPU,
and memory/cache
• Parallel file systems provide direct and
parallel access to storage
– Striping across and within storage nodes
• Staging to SSDs or another machine
hides the disk bandwidth
48

More Related Content

What's hot

2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfsdatabloginfo
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemKonstantin V. Shvachko
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file systemAnshul Bhatnagar
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePete Kisich
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHanborq Inc.
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesNitin Khattar
 

What's hot (20)

Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
HDFS for Geographically Distributed File System
HDFS for Geographically Distributed File SystemHDFS for Geographically Distributed File System
HDFS for Geographically Distributed File System
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
GlusterFS as a DFS
GlusterFS as a DFSGlusterFS as a DFS
GlusterFS as a DFS
 
Pillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS StoragePillars of Heterogeneous HDFS Storage
Pillars of Heterogeneous HDFS Storage
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
635 642
635 642635 642
635 642
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 

Similar to Storage solutions for High Performance Computing

Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetupasihan
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSandeep Patil
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Trishali Nayar
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptxSwarnaSLcse
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systemsViet-Trung TRAN
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingRutuja751147
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1sprdd
 

Similar to Storage solutions for High Performance Computing (20)

Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 
Spectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN CachingSpectrum Scale Unified File and Object with WAN Caching
Spectrum Scale Unified File and Object with WAN Caching
 
Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...Software Defined Analytics with File and Object Access Plus Geographically Di...
Software Defined Analytics with File and Object Access Plus Geographically Di...
 
SoNAS
SoNASSoNAS
SoNAS
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Dfs in iaa_s
Dfs in iaa_sDfs in iaa_s
Dfs in iaa_s
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 
Introduction to distributed file systems
Introduction to distributed file systemsIntroduction to distributed file systems
Introduction to distributed file systems
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 
Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1Wheeler w 0450_linux_file_systems1
Wheeler w 0450_linux_file_systems1
 

Recently uploaded

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单enxupq
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sMAQIB18
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 

Recently uploaded (20)

一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Computer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage sComputer Presentation.pptx ecommerce advantage s
Computer Presentation.pptx ecommerce advantage s
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 

Storage solutions for High Performance Computing

  • 1. 1 Storage Infrastructure for HPC Gabriel Mateescu mateescu@acm.org
  • 2. Overview • Data-intensive science • Architecture of Parallel Storage • Parallel File Systems – GPFS, Lustre, PanFS • Data Striping • Scale-out NAS and pNFS • IO acceleration 2
  • 3. The 4th paradigm of science • Experiment • Theory: models • Computational Science: simulations • Data-intensive science – Unifies theory, experiment, and simulation – Digital information processed by software – Capture, curation, and analysis of data – Creates a data explosion effect 3
  • 4. Data Explosion 1 • Explosion of data volume – The amount of data doubles every two years – Number of files grows faster • Challenges: – Disk bandwidth growth lags compute bandwidth growth – Data management: migration to appropriate performance tier, replication, backup, compression – Capacity provisioning 4
  • 5. Data Explosion 2 • Turning data into actionable insights requires solving all these challenges – Enough storage capacity – Data placement and migration – Data transfer bandwidth – Data discovery • New technology needed to handle massive data sizes and file counts – Access, preservation and movement of data requires high-performance, scalable storage 5
  • 6. Early days of HPC storage 6 Compute Node 0 File 0 Local file system • One file per compute node • Hard to manage and data stage-in and stage- out needed Compute Node 1 File 1 Local file system Compute Node 2 File 2 Local file system Compute Node 3 File 3 Local file system
  • 7. Parallel and shared storage 7 Compute Node 0 File A • All compute nodes can access all files • Multiple compute nodes can access the same file concurrently Compute Node 1 File B Compute Node 2 File C Compute Node 3 File D Shared and Parallel File System
  • 8. Parallel Storage • Parallel storage system – Aggregate a large number of storage devices to provide a system whose devices can be accessed concurrently by many clients – Ideally, the throughput of the system is the sum of the throughput of the storage devices • Parallel file system – Global namespace on top of the storage system: all clients see the same filenames – Global address space: all clients see the same address space of a given file 8
  • 9. Network Attached Storage Storage Device File System Server Storage Device RAID Controller Storage Device File System Server Storage Device Interconnect Fabric 10GE, InfiniBand Client node Client node RAID Controller
  • 10. Directly Attached Storage Storage Device File System Server Storage Interconnect Fabric 10GE, FCoE, InfiniBand Storage Device Storage Device Storage Device Cluster Interconnect Fabric 10GE, InfiniBand Compute node RAID Controller RAID Controller File System Server Compute node
  • 11. Scale-out NAS (SoNAS) Storage Device File System Server Storage Interconnect Fabric 10GE, FCoE, InfiniBand Storage Device Storage Device Storage Device WAN Interconnect Client node RAID Controller RAID Controller File System Server Client node
  • 12. Parallel File System vs SoNAS • Parallel file system – Provides high throughput to one file by striping the file across several storage devices – Client nodes may also be file system servers • Scale-out NAS (SoNAS) – Parallel File System + Parallel Access Protocol – File system servers typically not on the LAN of the compute nodes (clients) 12
  • 13. • A LUN is a logical volume made out of multiple physical disks • Typically, a LUN is built as a RAID array – RAID offers redundancy and/or concurrency • There are several RAID types – RAID0: striping – RAID6: striping and two parity blocks • 8 data disks + 2 parity disks • Parity disks are distributed across 10 disks 13 LUN and RAID
  • 14. • RAID stripe: a sequence of blocks that contains one block from each disk of a LUN – Stripe width = number of disks per LUN – Stripe depth = size per disk – Stripe size = Stripe width ×Stripe depth • File system stripe: a sequence of blocks (segments) that contains one block from each LUN – Stripe width = number of LUNs – Stripe depth, aka block size 14 Striping
  • 15. Scaling • Capacity scaling – cores/node, memory/node, node count – storage size, network switches • Performance scaling – GFlops, Instructions/cycle, Memory bandwidth – IO throughput: large or small file, metadata • IO scaling requires a balanced system architecture to avoid bottlenecks 15
  • 17. Storage wall • As the system size (CPUs, memory, interconnect, the number of compute nodes) increases, providing scalable IO throughput becomes very expensive • Ken Batcher, recipient of the Seymour Cray award put it this way: – A supercomputer is a device for turning compute-bound problems into IO-bound problems 17
  • 18. IBM GPFS (1) 18 • Global Parallel File System – Supports both architectures • network-attached: software or hardware RAID • directly-attached – Network Shared Disk • Cluster-wide naming • Access to data – Full POSIX semantics • Atomicity of concurrent read and write operations
  • 19. GPFS Directly Attached Storage Storage Device Storage Interconnect Fabric 10GE, FCoE, InfiniBand Storage Device Storage Device Storage Device Cluster Interconnect Fabric 10GE, InfiniBand RAID Controller RAID Controller GPFS NSD Server Compute node GPFS client GPFS NSD Server Compute node GPFS client
  • 20. GPFS Network Attached Storage Storage Device Storage Node NSD Server Storage Device Interconnect Fabric 10GE, InfiniBand Compute node GPFS client NSD Storage Device Storage Node NSD Server Storage Device Compute node GPFS client NSD
  • 21. HA for Network Attached Storage Storage Array Storage Node Storage Array Storage Node • If a storage node fails, the load on the other storage node doubles • Tolerates failure of one out of two nodes
  • 22. Triad HA Storage Array Storage Node Storage Node • If a storage node fails, the load on the other two storage nodes grows with 50% • Tolerates failure of two out of three nodes Storage Node Storage Array Storage Array
  • 23. IBM GPFS 2 23 • Nodeset: group of nodes that operate on the same file systems • GPFS management servers – cluster data server: one or two per cluster • cluster configuration and file system information – file system manager: one per file system • disk space allocation, token management server – configuration manager: one per nodeset • Selects file system manager – metanode: one per opened file • handles updates to metadata
  • 24. GPFS Scaling • GPFS meta-nodes – Each directory is assigned to a metanode that manages it, e.g., locking – Each file is assigned to a metanode that manages it, e.g., locking • The meta-node may become a bottleneck – One file per task: puts pressure on the directory meta-node for large jobs, unless a directory hierarchy is created – One shared file: puts pressure on the file meta- node for large jobs 24
  • 25. GPFS Striping (1) • GPFS-level striping: spread the blocks of a file across all LUNs – Stripe width = number of LUNs – GPFS block size = block stored in a LUN • RAID-level striping – Assume RAID6 with 8+2P, block-level striping – Stripe width is 8 (8 + 2P) – Stripe depth is the size of a block written to one disk; a multiple of the sector size, e.g., 512 KiB – Stripe size = Stripe depth × Stripe width = 8 × 512 KiB = 4 MiB 25
  • 26. GPFS Striping (2) • GPFS block size – equal to the RAID stripe size = 4 MiB • Stripe width impacts aggregate bandwidth – GPFS Stripe width equal to number of LUNs maximizes throughout per file – RAID Stripe Width of 8 (8+2P) for RAID6 balances performance and fault tolerance • Applications should write blocks that are – multiple of the GPFS block size and aligned with the GPFS blocks 26
  • 27. Impact of IO Block Size 27 IO size (Bytes) Throughput (MB/sec) for a 1TB SAS Seagate Barracuda ES2 disk
  • 28. Handling Small Files • Small files do not benefit from GPFS striping • Techniques used for small files – Read-ahead: pre-fetch next disk block on disk – Write behind: buffer writes • These are used by other parallel file systems as well – For example, Panasas PanFS 28
  • 29. Lustre file system 1 • Has the network-attached architecture • Object-based storage – Uses storage objects instead of blocks – Storage objects are units of storage that have variable size, e.g., an entire data structure or database table – File layout gives the placement of objects rather than blocks • User can set stripe width and depth, and the file layout 29
  • 30. Lustre Architecture Object Storage Target Object Storage Server (OSS) Interconnect Fabric 10GE, InfiniBand Lustre client Lustre client Metadata Server (MDS) Object Storage Target Object Storage Target Object Storage Server (OSS) Object Storage Target Metadata Target
  • 31. Lustre file system 2 • Metadata servers (MDS) – Manages file metadata and global namespace • Object storage server (OSS) – Is the software that fulfills requests from clients and gets/stores data to one or more Object Storage Targets (OSTs) – An OST is a logical unit number, which can consists of one or more disk drives (RAID) • Management Server (MGS) – can be co-located with MDS/MDT 31
  • 32. Parallel NFS (pNFS) • pNFS allows clients to access storage directly and in parallel – Separation of data and metadata – Direct access to the data servers – Out-of-band metadata access • Storage access protocols: – File: NFS v4.1 – Object: object-based storage devices (OSD) – Block: iSCSI, FCoE 32
  • 33. pNFS architecture NFS Data Server pNFS client NFS MDS pNFS client NFS Data Server
  • 34. pNFS over Lustre NFS Data Server pNFS client Lustre MDS pNFS client NFS MDS Lustre OSS NFS Data Server Lustre OSS
  • 35. Panasas Solution (1) • SoNAS based on – PanFS: Panasas ActiveScale file system – pNFS or DirectFlow: Parallel access protocol • Architecture – Director Blade: MDS and management – Storage Blade: storage nodes • Disk: 2 or 3 TB/disk, 75 MB/s, one or two disks • SSD (some models): 32 GB SLC • CPU + Cache – Shelf = 1 director blade + 10 storage blades 35
  • 36. pNFS over PanFS NFS Data Server pNFS client Director Blade pNFS client NFS MDS Storage Blade NFS Data Server Storage Blade
  • 37. Panasas Solution (2) • Feeds and Speeds – Shelf: 10 storage blades + 1 directory blade • Disk Size = 10 * 6 TB = 60 TB • Disk Throughput: 10 * (2 * 75 MB/s) = 1.5 GB/s – Rack: 10 shelves • Size = 600 TB • Throughput: 15 GB/s – System: 10 racks • Size = 6 PB • Throughput: 150 GB/s 37
  • 38. Data vs Computation Movement 38 • Consider a Lustre cluster with –100 compute nodes (CNs), each with 1 TB local storage, 80 MB/s per local disk –10 OSS and 10 OSTs/OSS, –1TB/OST, 80 MB/s per OST –4x SDR InfiniBand network, has 8 Gbps that is, 1 GB/s
  • 39. Lustre cluster Object Storage Target Object Storage Server (OSS) InfiniBand Fabric Lustre client Metadata Server (MDS) Object Storage Target Object Storage Target Object Storage Server (OSS) Object Storage Target Metadata Target Compute node Lustre client Compute node Local Disk Local Disk 1 GB/s 80 MB/s
  • 40. MapReduce /Lustre 40 • Compute Nodes access data from Lustre • Disk throughput per OSS = 10 * 80 MB/s = 800 MB/s –InfiniBand has 1 GB/s, so it can sustain this throughput • Aggregate disk throughput –10 * 800 MB/s = 8 GB/s
  • 41. MapReduce on Lustre vs HDFS 41 • MapReduce/HDFS: – Compute nodes use local disks – Per compute-node throughput is 80 MB/s – Aggregate disk throughput is 100 * 80 MB/s = 8 GB/s • Aggregate throughput is the same, 8 GB/s – The interconnect fabric provides enough bandwidth for the disks • MapReduce/Lustre is competitive with MapReduce/HDFS for latency-tolerant work
  • 42. Data & Compute Trends • Compute power: 90% per year • Data volume: 40-60% per year • Disk capacity: 50% per year • Disk bandwidth: 15% per year • Balancing the compute and disk throughput requires the number of disks to grow faster than the number of compute nodes 42
  • 43. IO Acceleration 43 • Disk bandwidth does not keep up with memory and network bandwidth • Hide low disk bandwidth using fast buffering of IO data –IO forwarding –SSDs
  • 44. Data Staging 44 • Data staging –IO forwarding or SSDs • IO forwarding hides disk bandwidth by –Buffering the IO generated by an application on staging machine: free memory on the supercomputer for simulation – Overlapping computation on the supercomputer with IO on the staging machine
  • 45. 45 • Consider a machine with 1 PB of RAM that reaches the peak performance of 1 PFlop/sec when the operational intensity is >= 2 Flop/B • Consider an application with operational intensity 1 Flop/B that uses 1 PB of RAM, executes 600 PFlop/iteration, and dumps each iterate to disk • Running the application on the above machine takes a time per iteration Tcomp = (600 PFlop/iteration )/(.1 PFlop/sec) = 6000 sec Benefits of IO forwarding (1)
  • 46. Benefits of IO forwarding (2) 46 • We can hide almost all the IO time if we can – Copy 1PB to a staging machine in Tfwd << Tcomp – Write the 1 PB from the staging machine to disk in (Tcomp – Tfwd ) ~ Tcomp • Assume the staging machine has 64 K nodes each with a 4x QDR port (4 GB/sec per port); then Throughput = 64 K * 4 GB/sec = 256 TB/sec Tfwd = 1024 TB/ (64 K * 4 GB/sec) = 4 sec << Tcomp • So the required disk bandwidth is BW = (1 PB) / (6000 sec) = 166 GB/sec << 256 TB/sec • Similar benefit for SSDs
  • 47. SSD Metadata Store • MDS is a bottleneck for metadata- intensive operations • Use SSD for the metadata store • IBM GPFS with SSD for metadata store – eight NSD servers with four 1.8 TB SSD and 1.25 GB/s, PCIe attached; two GPFS clients – Processes the 6.5 TB of metadata for a file system with 10 Billion files in 43 min – Enable timely policy-driven data management 47
  • 48. Conclusion • Parallel storage has evolved similarly to parallel computation – Scale by adding disk drives, networking, CPU, and memory/cache • Parallel file systems provide direct and parallel access to storage – Striping across and within storage nodes • Staging to SSDs or another machine hides the disk bandwidth 48