The Importance of Fast, Scalable Storage for Today’s HPC

The Importance of Fast, Scalable
Storage for Today‟s HPC
Bill Webster
High Performance Data Division, Intel Corporation
For more follow @IntelITCenter on Twitter

Some Data About Data….
2.5 >80% ~90%
Quadrillion bytes of
data created daily1
Of data today‟s data is
unstructured
Of the world‟s data has
been created within the
last 2 years…
1 Source: IBM

The Case for Fast, Scalable Storage
Solving important problems drives technology investments
Fast storage is critical for maximum application performance
Lustre software was created for performance at large scale
Storage fueled by Lustre* is stable, flexible and highly efficient
Lustre is the most widely used parallel storage for HPC1
Over 60% of the fastest 100 HPC sites worldwide rely on Lustre2
1 Source: IDC research
2 Intel analysis of www.top500.org rankings, December 2013
* Some names and brands may be claimed as the property of others.

• Workloads are diverse and dynamic, and applications are
compute or data-intensive – often both
• The value of HPC storage is measured by speed, scale & IOPS
• To meet these requirements, HPC storage needs to:
• Scale-out for increased I/O and capacity
• Perform I/O in parallel for maximum throughput
• Support virtually unlimited number of clients
• Commercial “HPC” needs the same level of performance
Lustre was architected for speed, scale and IOPS
4
HPC Places Unique Demands on
Storage

HPC Storage Software
Introducing the Lustre file system
5

What is Lustre*?
6
Open source, distributed, parallel, clustered file system
Designed for maximum performance at massive scale
POSIX compliant – key for supporting applications
Global, shared name space – all clients can access all data
Very resource efficient and cost effective

What Makes Lustre* So Important?
7
Purpose-built for speed and scale:
 Speed: Unmatched performance
 Openness: choice of storage platforms
 Efficiency: Achieves +90% utilization of storage resources
 Affordable: Low CAPEX and OPEX
 Scale-out: Independently scale storage capacity and bandwidth
 Stable and reliable: Backed by Intel, the worldwide leader in Lustre support

Good Fit Applications for Lustre*…
8
Financial analysis – Modeling risk exposure & portfolio valuation
Geosciences - weather forecasting and climate modeling
Bioinformatics – genomics, proteomics, drug discovery
Energy - exploration, reservoir modeling, wind energy
Engineering - CAE, CFD and FEA for aerospace, automotive
SCIENCEANALYTICS ENGINEERING

What Does a Lustre* Solution Look
Like?
Management
Network
High Performance Data Network
(InfiniBand, 10GbE)
Metadata
Servers
Object Storage
Servers
Intel Manager for Lustre*
(requires Enterprise Edition)
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
Lustre Clients – diskless compute servers

Management Servers
Management
Network
(InfiniBand, 10GbE)
Metadata
Servers
Object Storage
Servers
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
1

Storage Servers
Management
Network
(InfiniBand, 10GbE)
Metadata
Servers
Object Storage
Servers
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
2

Compute clients
Management
Network
(InfiniBand, 10GbE)
Metadata
Servers
Object Storage
Servers
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
3

Interconnect fabric
Management
Network
(InfiniBand, 10GbE)
Metadata
Servers
Object Storage
Servers
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
4

The Results? Fast, scalable storage &
I/O
Management
Network
(InfiniBand, 10GbE)
Object Storage
Servers
Object Storage
Servers
Object Storage
Targets (OSTs)
Object Storage
Targets (OSTs)
Metadata
Target (MDT)
Management
Target (MGT)
• Over +2 TB/s achieved
• 500-750 GB/s production
• +80,000 IO/s

Intel® Lustre Solutions
Enterprise Edition for Lustre* software

Intel® Enterprise Edition for Lustre*
16
Intel®
Manager for
Lustre is the
heart of all
Intel EE for
Lustre based
solutions.

Intel® Manager for Lustre*
17
The ‘dashboard’ canvas
displays a variety of
charts that illustrates
performance levels and
resource utilization.
Visual system status
indictor
Configure, create and
optimize Lustre file
systems
Intelligent, intuitive logging
– understand how your
storage is performing
quickly and easily

The Convergence of HPC and Big Data
• Big Data problems are getting larger
• More compute power. More files. More capacity and data throughput
• MapReduce workloads are being added to HPC environments
• 1 in 3 HPC sites have deployed Hadoop1
• But MapReduce workloads run differently than typical HPC applications
• Compute nodes are diskless – no local storage
• By default, Hadoop expects local storage within each node
• Lustre storage accelerates the value of Hadoop
• Improves application performance
• Boosts storage efficiency and lowers management complexity
19
1 Source: IDC research

Intel® Enterprise Edition
for Lustre* software
Includes the
Hadoop „adapter‟ for
Lustre
• Replacement for
HDFS
• Shared, parallel
storage optimizes
performance
• Lowers management
complexity
• Maximizes utilization
of storage resources

21
Case Study: Sanger Wellcome Trust
Challenge: Improved processes and lab equipment led to
exponential increases in the volume of data being generated –
but storage budgets were growing slowly.
Large data sets are difficult to proactively manage, and can
easily overwhelm storage resources. Un-optimized storage
had a direct, negative impact on application performance –
slowing the time for breakthrough results.
Solution: Exploit the power and scale of HPC-class storage,
powered by Lustre* software and supported by Intel.
Benefits provided:
 Openness – Broad array of storage vendors and
products
 Global namespace – all clients can access all data
 Performance – Upwards of 1 TB/s
 Capacity - Virtually unlimited file system and per
file sizes
 Confidence – Backed by Intel expertise with Lustre
• 10-15 TB of processed data
weekly
• Processed data is small fraction
of overall storage capacity
• Stored in iRODS data
warehouse
• BAM or FASTA format files
• Use pattern matching algorithms
like BWA and BLAST
• Lustre offers immense scalable
capacity
• Now have 8 production Lustre
file systems – and are planning
to add more
• Performance was main goal –
but scale, flexibility, efficiency
were critical

The Importance of Fast, Scalable Storage for Today’s HPC

The Importance of Fast, Scalable Storage for Today’s HPC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Importance of Fast, Scalable Storage for Today’s HPC

Similar to The Importance of Fast, Scalable Storage for Today’s HPC (20)

More from Intel IT Center

More from Intel IT Center (20)

Recently uploaded

Recently uploaded (20)

The Importance of Fast, Scalable Storage for Today’s HPC

Editor's Notes