An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Invent 2013

1,311 views

Published on

Researchers at Clemson University assigned a student summer intern to explore bioinformatics cloud solutions that leverage MPI, the OrangeFS parallel file system, AWS CloudFormation templates, and a Cluster Scheduler. The result was an AWS cluster that runs bioinformatics code optimized using MPI-IO. We give an overview of the process and show how easy it is to create clusters in AWS.

Published in: Technology, Travel
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,311
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
33
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

An MPI-IO Cloud Cluster Bioinformatics Summer Project (BDT205) | AWS re:Invent 2013

  1. 1. An MPI-IO Cloud Cluster Bioinformatics Summer Project Brandon Posey, Dougal Ballantyne, Boyd Wilson November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Filesystems on AWS
  3. 3. What filesystems *MUST* you use on AWS?
  4. 4. The one that means the needs of your unique application needs! Some things to consider: • Total amount of storage required? • Resilience required? • Expected number of clients? • Locality of servers and clients? • Average file sizes? (KB, MB, GB, TB) • Block sizes used by applications? • IO profile? Read/Write%? • Typical IO use case?
  5. 5. Filesystems on AWS are all about building blocks!
  6. 6. Building Blocks • Amazon Elastic Compute Cloud (Amazon EC2) – 1ECU to 88ECU of compute power – 613MB to 240GB of memory – Shared network, EBS optimized, dedicated 10Gb Amazon EC2 • Amazon Simple Storage Service (Amazon S3) – Unlimited capacity – Web-scale – Lifecycle management Amazon S3
  7. 7. Building Blocks • Local storage (ephemeral) – 150GB to 3360GB per instance – HDD and SSD – FREE! (part of instance cost) Ephemeral Disk • Amazon Elastic Block Store (Amazon EBS) – – – – 1G to 1000GB per volume Standard and Provisioned IOPS Multiple volumes per instance Supports snapshot to Amazon S3 Amazon EBS
  8. 8. Storage-optimized EC2 instances http://aws.amazon.com/ec2/instance-types/ "This family includes the HI1 and HS1 instance types, and provides you with Intel Xeon processors and directattached storage options optimized for applications with specific disk I/O and storage capacity requirements." • HI1 instances features SSD storage • HS1 instances feature direct attach HDD
  9. 9. Amazon EBS optimized instances http://aws.amazon.com/ebs/ "To enable your Amazon EC2 instances to fully utilize the IOPS provisioned on an EBS volume, you can launch selected Amazon EC2 instance types as “EBS-Optimized” instances."
  10. 10. What Are Your Needs? • • • • Temporary or long-term storage? Shared or per instance? How much? How fast?
  11. 11. Long term storage • Use Amazon S3 • Pull datasets when needed • Easy to access using AWS CLI or API $ aws s3 cp s3://mybucket/dataset/input /ephemeral/input • Lifecycle to Amazon Glacier
  12. 12. Temporary Storage • Local ephemeral for scratch • Distributed filesystem for high-performance scratch – OrangeFS – Lustre – Ceph • Pull data from Amazon S3
  13. 13. How much? • With Amazon S3, you pay for what you use • With Amazon EBS, you pay for what you provision • Keeping data in Amazon S3 and only pulling what is needed helps mange cost
  14. 14. How fast? • Ephemeral storage can deliver up to 2.2GB/sec – more instances == more throughput • Amazon EBS volumes support up to 4000 IOPS – more volumes == more IOPS • Amazon S3 scales horizontally – more client == more throughput – more connections == more throughput
  15. 15. Making filesystems persist • Use Amazon EBS for block storage • Use Amazon EBS snapshots for recovery • Use a replicated distributed filesystem
  16. 16. Automating deployments • • • • AWS CloudFormation Drive storage through parameters Easy to set up and tear down Track template changes in SCM
  17. 17. Solutions on AWS • OrangeFS from Omnibond • Red Hat Storage 2.0 • Intel Cloud Edition Lustre - Private Beta
  18. 18. Customer presentation
  19. 19. RNA-Seq Differential Gene Expression Workflow Clemson University Professor, Dr. Alex Feltus had been discussing with Eddie Duffy and Dr. Barr Von Oehsen, about optimizing the Gene Expression Workflow. As a result, a summer project with Brandon Posey was started to work with this optimization in the AWS cloud. The longest processing steps were the FastQ steps and is where the optimization started. *Workflow chart provided with permission from Allele Systems (www.allelesystems.com)
  20. 20. OrangeFS – Scalable Parallel File System on AWS Unified High Performance File System OrangeFS Instance Amazon DynamoDB Amazon EBS volumes Available on the AWS Marketplace and brought to you by Omnibond
  21. 21. Cloud Cluster Built using AWS, Torque/Maui, OrangeFS Optimization Areas • Data uploaded and retrieved via OrangeFS WebDav Interface • MPI Jobs are submitted via Torque & Maui Scheduler • All built with AWS CloudFormation template MPI-IO Clients Torque / Maui OrangeFS WebDAV OrangeFS Servers Amazon DynamoDB
  22. 22. AWS CloudFormation Prompts "KeyName" : { "VpcId" : { "VpcPublicSubnetId" : { "NAT & OrangeFS… AccessFrom" : { "FSConfigDDB" : {… "WorkerConfigDDB" : {… "Type" : "AWS::DynamoDB::Table", "CfnUser" : { …. "Type" : "AWS::IAM::User",…
  23. 23. AWS CloudFormation – Amazon DynamoDB "FSConfigDDB" : { "Type" : "AWS::DynamoDB::Table", … "WorkerConfigDDB" : { "Type" : "AWS::DynamoDB::Table", …
  24. 24. AWS CloudFormation - IAM & Network "instanceRootRole" : { "instanceRootProfile" : { "HostKeys" : { "PrivateSubnet" : { "PrivateRouteTable" : { "PrivateSubnetRouteTableAssociation" : { "PrivateNetworkAcl" : { "NATIPAddress" : {… "Type" : "AWS::EC2::EIP",
  25. 25. AWS CloudFormation – Instances "NATDevice" : {… "Type" : "AWS::EC2::Instance", "MasterCoordinator" : {… "Type" : "AWS::EC2::Instance", "OrangeFSFleet" : {… "Type" : "AWS::AutoScaling::AutoScalingGroup", "WorkerFleet" : {… "Type" : "AWS::AutoScaling::AutoScalingGroup", "WebDavDevice" : {… "Type" : "AWS::EC2::Instance",
  26. 26. AWS CloudFormation – Cloud Init (python & Boto) "sudo /usr/bin/python2.7 /home/ec2-user/TorqueMasterConfigure.py -l DEBUG -f /home/ec2-user/MasterConfig.log”, " -n ", {"Ref" : "WorkerConfigDDB"}, " -o ", {"Ref" : "FSConfigDDB"}, " -s ", {"Fn::FindInMap" : [ "ConfigParameters", "OrangeFSFleetSize", "item"]}, " -z ", {"Fn::FindInMap" : [ "ConfigParameters", "WorkerFleetSize", "item"]}, " -m ", {"Fn::FindInMap" : [ "ConfigParameters", "WorkerMaxFleetSize", "item"]}, " -p ", {"Fn::FindInMap" : [ "ConfigParameters", "OrangeFSPort", "item"]}, " -a ", {"Fn::FindInMap" : [ "ConfigParameters", "FSName", "item"]}, " -d ", {"Fn::FindInMap" : [ "ConfigParameters", "FSID", "item"]}, "n",
  27. 27. Demo • Spin up a cluster on AWS live
  28. 28. RNA-Seq Differential Gene Expression Workflow Optimization Areas • Fast- Splitter rewritten in MPIIO to leverage OrangeFS in AWS • Merge-FastQ also rewritten in MPIIO to leverage OrangeFS in AWS *Workflow chart provided with permission from Allele Systems (www.allelesystems.com)
  29. 29. Genomics – Data @@@FFF=BFHFDHCCDECJHIIIHG@GEEGAGEHFDHDHGIF@FGDEBFGIIGG=CGFGCDCEGH FEEECEBADBB?BCCCC<5:>@CCCA<9>C@A@ACB @HWI-ST1097:170:C1LBBACXX:6:1101:1379:2208 1:N:0:CGATGT CCTGTTATTGCCTCAAACTTCCGTGGCCTAAAACGCCAAAGTCCCCCTAAGAAGATAGCTGCGGG GGGGTGGCTCCGCCTAGCTAGTTAGGAAGCTGAGGG + CCCFFFFFHHHHHJJJJJJJJJJFAC8A*1?E################################# #################################### @HWI-ST1097:170:C1LBBACXX:6:1101:1582:2059 1:N:0:CGATGT GTATTGTCATAAGCAGTTAAAGCTGATGTGCGCCTGTCATGTAATGCTGTAGAAACAAGCTCAGC AAGCTGCTGCTTTTGTGTTCTTGCACCGGAGNTCTT
  30. 30. Torque/Maui Job #!/bin/bash #PBS #PBS #PBS #PBS #PBS -l -l -j -q -N nodes=4 walltime=4:00:00 oe batch AWS cd /mnt/orangefs mpirun /usr/local/bin/concat -p '/mnt/orangefs/Sample_Feltus1_L006_R2.cat.fastq.*' -o Combined.fastq >> /mnt/orangefs/Results.txt
  31. 31. FastQ Splitter Time (seconds) Old Method 0 500 1000 1500 2000 2500 3000 3500 4000 Seconds cc2.8xlarge m3.xlarge m1.xlarge 0 20 Read Input 40 Seconds Transfer 60 Write Output 80 100
  32. 32. FastQ Merge Time (seconds) Old Method 0 500 1000 Seconds 1500 2000 2500 cc2.8xlarge m3.xlarge m1.xlarge 0 20 40 Seconds 60 Merge Time 80 100 120
  33. 33. Demo • Torque/Maui Job on the cluster that was spun up.
  34. 34. More Info • AWS Marketplace… – OrangeFS Community Edition – OrangeFS Advanced Edition • Community… Orangefs.org • Pipeline – Allele Systems… allelesystems.com
  35. 35. Please give us your feedback on this presentation BDT205 As a thank you, we will select prize winners daily for completed surveys!

×