Breaking IO Performance Barriers: Scalable Parallel File System for AWS
Upcoming SlideShare
Loading in...5
×
 

Breaking IO Performance Barriers: Scalable Parallel File System for AWS

on

  • 672 views

Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and ...

Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and large-scale manufacturing in the aerospace and automotive sectors. As HPC-powered simulations continue to grow ever larger and more complex, scientists are looking for cost-effective high performance compute resources that's available when they need it. Access to on-demand infrastructure allows opportunities to experiment and try new speculative models. AWS provides computing infrastructure that allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Driven by its flexibility and affordability, many HPC and big data workloads are transitioning from on premise entirely onto AWS.

But like on-premises HPC, maximizing application of ""HPC cloud"" workloads requires fast and highly scalable storage.

Intel® Cloud Edition for Lustre Software has been purpose-built for use with the dynamic computing resources available from Amazon Web Services to provide the fast, massively scalable storage software resources needed to accelerate performance, even on complex workloads.

Statistics

Views

Total Views
672
Views on SlideShare
670
Embed Views
2

Actions

Likes
3
Downloads
40
Comments
0

1 Embed 2

http://www.slideee.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Breaking IO Performance Barriers: Scalable Parallel File System for AWS Breaking IO Performance Barriers: Scalable Parallel File System for AWS Presentation Transcript

  • © 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Breaking IO Performance Barriers: Scalable Parallel File System for AWS Paresh G. Pattani, Ph.D. Sr. Director, High Performance Data Solutions Intel Corporation July 10, 2014
  • The need for parallel storage
  • Parallel Storage Needs • Time spent storing and retrieving data is time not spent on compute. Fast storage maximizes processing utilization. Scalability Reliability Performance • Growing datasets require greater amounts of storage and the ability to expand existing storage. • Large clusters and critical workloads require a comprehensive focus on data availability.
  • Scale Out Storage Using Lustre* • Purpose-built for HPC • Distributed, Parallel, Vast Global Namespace • Linux server based • Linux, Windows and Mac client support • Support for 100,000+ Clients • Designed for Reliable Storage • Now available on AWS Marketplace lustre.intel.com/cloudedition * Some names and brands may be claimed as the property of others.
  • Intel Strategy for Lustre* Storage Extend core Lustre* for use across HPC and enterprise applications Intel Enhanced Lustre* – HPC Clouds  Extend core Lustre* with key features for new markets and use cases  Push Lustre* onto HPC cloud infrastructure Open-source innovation driving performance at scale Open Source - Powerful storage foundation for exascale applications  Increased scale and streaming bandwidth  Accelerate maturity, lower risk and grow the ecosystem 1 2 * Some names and brands may be claimed as the property of others.
  • Use Models: Cloud Resources for HPC 1 Augment: burst peak workloads and supplement resources 2 Transition: move on-premises HPC to cloud infrastructure 3 Deploy: launch new applications exclusively to the cloud
  • Key HPC Markets Using Lustre* Today Large-scale Manufacturing Weather and Climate Life Sciences Energy Finance * Some names and brands may be claimed as the property of others.
  • What Does Intel® Cloud Edition for Lustre* Software Look Like? *Other names and brands may be claimed as the property of others.
  • MDS MDS Lustre* Components Management Metadata Storage Lustre* mount service Initial point of contact for clients Namespace of file system File layouts, no data Scalable File content stored as objects Striped across targets Scales to 100+ MGT MDT OST OST MGS OSS OSS *Other names and brands may be claimed as the property of others.
  • Deploying a Storage Cluster
  • Deploying a Storage Cluster
  • Deploying a Storage Cluster
  • Deploying a Storage Cluster
  • Monitoring & Command Line Interface
  • Performance….
  • Large File Benchmark Comparing 3 Lustre* cluster configuration Increase the number of OSSs • 4 OSS • 8 OSS • 16 OSS Configurations of MGS and MDS are the same We use 32 clients MDS EBS Optimized RAID0 8x 40GB Standard 110 MB/sec m3.2xlarge OSS EBS Optimized 8x 100GB Standard 110 MB/sec m3.2xlarge Client 110 MB/sec m3.2xlarge MGS 94 MB/sec m1.medium *Other names and brands may be claimed as the property of others.
  • IOR Sequential Read FPP 0 200 400 600 800 1000 1200 1400 1600 1 2 4 8 16 32 4OSS 8OSS 16OSS N. Clients MB/sec Client’s network bottleneck OSS’s network bottleneck OSS’s network bottleneck Close to the OSS network
  • 0 200 400 600 800 1000 1200 1400 1600 1 2 4 8 16 32 4OSS 8OSS 16OSS IOR Sequential Write FPP N. Clients MB/sec Client’s network bottleneck OSS’s network bottleneck OSS’s network bottleneck Ops….
  • Aggregate Performance During Run • LTOP is available and we use it to record the OSTs activities during the IOR run. • With a simple python script we create this graph: “aggregate performance vs time” to analyze the problem. time 1920 MB/sec Long tail
  • Compare Lustre* and NFS *Other names and brands may be claimed as the property of others.
  • Small File Benchmark Simulated EDA Benchmark • Simulate workload by compiling a package • untar; configure; make; • Python wrapper parallelizes on cluster using MPI • Calculate score based on (total workload/runtime) 32 Clients • Linux, c3.xlarge Compare with NFS • Linux, i2.4xlarge • 4x EBS RAID0
  • Lustre* Configuration 1 MGT • m3.medium 1 - 4 MDTs • m3.2xlarge • 8x 40GB EBS 4 OSTs • c3.xlarge • 8x 40GB EBS *Other names and brands may be claimed as the property of others.
  • EDABench – Lustre* vs. NFS 0 2000 4000 6000 8000 10000 12000 1 2 4 8 16 32 64 128 EDABench Score (Compile) Processes (32 clients) 1 MDT 2 MDTs 4 MDTs NFS *Other names and brands may be claimed as the property of others.
  • Storage Instance Cost Comparison • EBS Optimized for all storage instances • Global Support for Lustre* • Does not include EBS cost Cluster Option Total Cost / Hour Lustre* – 1xMDT + 4xOSS $2.00 Lustre* – 2xMDT + 4xOSS $2.69 Lustre* – 4xMDT + 4xOSS $4.07 NFS – i2.4xlarge $3.51 *Other names and brands may be claimed as the property of others.
  • Intel® Cloud Edition for Lustre* software *Other names and brands may be claimed as the property of others.
  • Status Today • Available on AWS Marketplace • Setup in less than 10 minutes • Try for yourself lustre.intel.com/cloudedition lustre.intel.com/contactus
  • Thank You.