Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

High Performance Computing with AWS


Published on

More and more, the scalable on-demand infrastructure provided by AWS is being used by researchers, scientists and engineers in Life Sciences, Finance and Engineering to solve bigger problems, answer complex questions and run larger simulations. In this session we start by talking about the supercomputing class performance and high performance storage available to the scientists and engineers at their fingertips. We will go over examples of how startups are innovating and large enterprises are extending their HPC environments. Finally, we walk through some of the common questions that come up as organizations start leveraging AWS for their high performance computing needs.

Published in: Technology
  • 1. Which of the following calendar is the most scientific?
    a) Ethiopian of the tropics.
    b) Gregorian of the temperates.
    c) Julian of the temperates.

    2. Which of the following calendar is more scientific?
    a) Ethiopian of the tropics.
    b) Julian of the temperates.
    c) Gregorian of the temperates.

    3. Which of the following calendar is less scientific?
    a) Ethiopian of the tropics.
    b) Julian of the temperates.
    c) Gregorian of the temperates.
    Are you sure you want to  Yes  No
    Your message goes here
  • [slideshare id=16917191&doc=speedofearthvariesatdifferentdegreeoflatitudes-130304054902-phpapp02]
    Are you sure you want to  Yes  No
    Your message goes here

High Performance Computing with AWS

  1. 1. Jafar Shameem and David PellerinHigh Performance Computing with AWSBusiness Development, HPC
  2. 2. Migrate entire HPC applicationsand datacenters to the cloudUse cloud capabilities to createentirely new HPC applicationsAugment on-premise HPCresources with cloud capacityHow are Organizations Using Cloud for HPC?
  3. 3. • Security: Deploy applications and store data in a secure,highly configurable VPC environment• Agility: Deploy the right infrastructure for each technicalcomputing job, at the right time• Scalability: Add and subtract servers in minutes tooptimize time-to-results• Cost Savings: Pay only for what you use, don’t pay foridle or outdated serversWhy AWS for High-Performance Computing?
  4. 4. WasteUser/CustomerDissatisfactionActual demandPredicted DemandRigid On-Premise Resources Elastic ResourcesActual demandResources scaled to demandAWS for Agility
  5. 5. On-DemandPay for computecapacity by the hourwith no long-termcommitmentsFor spiky workloads,or to define needsMany purchase models to support different needsReservedMake a low, one-timepayment and receive asignificant discount onthe hourly chargeFor committedutilizationSpotBid for unused capacity,charged at a Spot Pricewhich fluctuates basedon supply and demandFor time-insensitive ortransient workloadsDedicatedLaunch instances withinAmazon VPC that runon hardware dedicatedto a single customerFor highly sensitive orcompliance relatedworkloadsFree TierGet Started on AWSwith free usage & nocommitmentFor POCs andgetting started
  6. 6. Massive scale allows AWS to constantly reducecosts, while improving quality and reliabilityTCO of cloud is much lower then on-premise ITwhen all costs are consideredResult? Large scale datacenter-to-cloudmigrations are in progress every dayAWS for Scale
  7. 7. Scalable Computing: Go From Just One Instance…
  8. 8. To Thousands… in Just Minutes!
  9. 9. Memory(GiB)Small 1.7 GB,1 EC2 Compute Unit1 virtual coreMicro 613 MBUp to 2 ECUsLarge 7.5 GB4 EC2 Compute Units2 virtual coresExtra Large 15 GB8 EC2 Compute Units4 virtual coresHi-Mem XL 17.1 GB6.5 EC2 Compute Units2 virtual coresHi-Mem 2XL 34.2 GB13 EC2 Compute Units4 virtual coresHi-Mem 4XL 68.4 GB26 EC2 Compute Units8 virtual coresHigh-CPU Med 1.7 GB5 EC2 Compute Units2 virtual coresHigh-CPU XL 7 GB20 EC2 Compute Units8 virtual coresCluster GPU 4XL 22 GB33.5 EC2 Compute Units,2 x NVIDIA Tesla “Fermi”M2050 GPUsCluster Compute 4XL 23 GB33.5 EC2 Compute UnitsMedium 3.7 GB,2 EC2 Compute Units1 virtual coreHigh I/O 4XL 60.5 GB, 35EC2 Compute Units,2*1024 GB SSD-basedlocal instance storageHigh Storage 8XL 117 GB35 EC2 Compute Units24 * 2 TB instance storeCluster High Mem 8XL89 EC2 Compute Units244 GB SSD instance storageEC2 Compute UnitsCluster Compute 8XL 60.5 GB88 EC2 Compute UnitsChoose the Right Instance Type for the Job
  10. 10. On-PremiseExperimentinfrequentlyFailure isexpensiveLess InnovationCloudExperimentoftenFail quickly at alow costMore Innovation$ Millions Nearly $0AWS for Innovation
  11. 11. s on innovatione the muck of infrastructure management to AWS
  12. 12. • Engineering: CAD and CAE for aerospace, defense, structures,consumer products• Life Sciences: For basic research, drug discovery, genomics, andtranslational medicine• Energy and Geophysics: Including seismic processing, reservoirestimation, high-energy simulation, wind energy modeling, GIS• Financial Services and Insurance: Including valuation and riskanalyticsAnd Many More!HPC Applications Running on AWS Today
  13. 13. HPC for EngineeringScalable Computing for CAD/CAE/EDA
  14. 14. AWS for Engineering• Computer-Aided Design, Simulation, Analysis, Visualization– For development of commercial and military products– Aerospace, automotive, civil, construction, energy, others– Across industries, the trend is Simulation-Driven Design• Examples– Computer-Aided Design (CAD) including 3D models– Electronic Design Automation (EDA)– Computational Fluid Dynamics (CFD)– Finite Element Analysis (FEA) and Thermal Analysis– Crash Analysis– Failure and Hazard Analysis
  15. 15. CFD for Turbine Engine Design• Time accurate fluid dynamics• SBIR-funded project for the US Air Force Research Laboratory (AFRL)• SAS 70 Type II certification and VPN-level access required• Additional security measures:– Uploaded and downloaded data was encrypted– Dedicated EC2 cluster instances were provisioned– Data was purged upon completion of the run“The results of this case were impressive. Using Amazon EC2 the large-scale,time accurate simulation was turned around in just 72 hours with computinginfrastructure costs well below $1,000.”
  16. 16. • Commercial provider of mixed-signal ASICs for X-ray and gamma raydetection and imaging• Needs to perform very large Monte Carlo simulations using as many as 4000server nodes• Computing workloads are highly variable, project-driven• Building an on-premise cluster to handle peak loads would be cost prohibitive• Solution: EC2 3rd-generation High-Memory instances• Up to 80% savings by using Spot instances on EC2Radiation Simulation for ASIC Design
  17. 17. 1) Customer Managed Application Hosting• Customer has account with AWS and manages infrastructure• Customer maintains traditional software vendor relationships• Software vendor offers license flexibility (BYOL)2) Vendor Managed Hosting to Augment On-Premise Application• Client-Server model for acceleration of batch tasks• Customer pays software vendor for AWS-hosted services• Customer does not need to manage low-level infrastructure3) Vendor Managed Software-as-a-Service• Pay-per-use, fully web-based including GUIScenarios for Technical Software
  18. 18. Trusted by Enterprises Worldwide
  19. 19. HPC for Life SciencesCustomer Case-Studies
  20. 20. And a rich history in Life Sciences
  21. 21. AWS Public Data Sets• A centralized repository of public datasets• Seamless integration with cloud based applications• No charge to the community• Some of the datasets available today:– 1000 Genomes Project– Ensembl– GenBank– Illumina – Jay Flateley Human Genome Dataset– YRI Trio Dataset– The Cannabis Sativa Genome– UniGene– Influenza Virrus– PubChem• Tell us what else you’d like for us to host …
  22. 22. Open Source ecosystem• NCBI BLAST• Crossbow• CloudBurst• Myrna• Clovr• BioPerl Max• VIPDAC• Superfamily• Cloud-Coffee• BioNimbus• GMOD• CloudAligner• CRdata• SeqWare• Blend• StormSeq• BioConductorGet links to AMIs at: StarCluster Sun Grid Engine CondorTorque Slurm RocksChef Puppet
  23. 23. Number of Cluster nodes canbe added depends on the computationalneeds
  24. 24. Remove constraintsCapex, operational skills,processing limitationsFocus on the problemNot the technical challengesof large compute clustersAchieve morePerform bigger, morecomplex jobs in a muchreduced timeIterate around theproblemDo more and afford to take morerisks as cost of experimentationreducedWhyAWS?
  25. 25. Data Transfer• AWS Import/Export– Move large amounts of data into and outside AWS– Data Migration, Content Distribution, DR, etc.• AWS Direct Connect– Secure private link to AWS– 1Gbps, 10Gbps connectivity– You can also co-locate hardware in AWS DX locations• Bandwidth Optimization Solutions– Commercial providers – Aspera, Riverbed, Attunity, etc.– Open Source – Tsunami UDP, Globus OnlineAWS DirectConnectAWSImport/Export
  26. 26. Relational Database ServiceFully managed database(MySQL, Oracle, MSSQL)DynamoDBNoSQL, Schemaless,Provisioned throughputdatabaseS3Object datastore up to 5TBper object99.999999999% durabilitySimpleDBNoSQL, SchemalessSmaller datasetsRedshiftPetabyte scaledata warehousing serviceFully managedStorage Options
  27. 27. 1.3 Trillion835k peak transactions per secondObjects in S3
  28. 28. GlacierLong term cold storageFrom $0.01 per GB/Month99.999999999% durabilityArchival“Every day our genome sequencers produce terabytes of data. As our companymoves into the clinical space, we face a legal requirement to archive patient datafor years that would drastically raise the cost of storage. Thanks to AmazonGlacier’s secure and scalable solution, we will be able to provide cost-effective,long-term storage and thereby eliminate a barrier to providing whole genomesequencing for medical treatment of cancer and other genetic diseases.”- Keith Raffel, Senior Vice President and Chief Commercial Officer, CompleteGenomics
  29. 29. Elastic MapReduceManaged, elastic Hadoop clusterIntegrates with S3 & DynamoDBLeverage Hive & Pig analytics scriptsIntegrates with instance types such as spotApplication ServicesFeature DetailsScalable Use as many or as few compute instances runningHadoop as you want. Modify the number of instanceswhile your job flow is runningIntegrated with otherservicesWorks seamlessly with S3 as origin and output.Integrates with DynamoDBComprehensive Supports languages such as Hive and Pig for defininganalytics, and allows complex definitions inCascading, Java, Ruby, Perl, Python, PHP, R, or C++Cost effective Works with Spot instance typesMonitoring Monitor job flows from with the managementconsoleCompute StorageAWS Global InfrastructureDatabaseApp ServicesDeployment & AdministrationNetworking
  30. 30. EMR Jobs0500,0001,000,0001,500,0002,000,0002,500,0003,000,0003,500,0004,000,0003.7 M clusterslaunched since May 2010
  31. 31. Crossbow• Align billions of reads and find SNPs– Reuse software components: Hadoop Streamingh" p://bowI• Map: Bowtie (Langmead et al., 2009)– Find best alignment for each read– Emit (chromosome region, alignment)• Reduce: SOAPsnp (Li et al., 2009)– Scan alignments for divergent columns– Accounts for sequencing error, known SNPs• Shuffle: Hadoop– Group and sort alignments by region…2…2Searching for SNPs with Cloud Computing.Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Genome Biology. 10:R134 
  32. 32. Worldwide research anddevelopmentThe Amazon Virtual Private Cloud was a uniqueoption that offered an additional level of security andan ability to integrate with other aspects of ourinfrastructure.“AWS enables Pfizer’s WRD to explore specific difficult or deepscientific questions in a timely, scalable manner and helpsPfizer make better decisions more quickly”Dr. Michael Miller, Head of HPC for R&D, Pfizer
  33. 33. Spiral Genetics• Alignment, Variant Calling, Annotation• Turnaround time– Targeted : less than 40 minutes– Exome : less than 2 hours– Whole Genome : less than 5 hours
  34. 34. • Workflows can be easily definedand automated with integratedGalaxy Platform capabilities• Data movement is streamlinedwith integrated Globus file-transfer functionality• Resources can be provisionedon-demand with Amazon WebServices cloud basedinfrastructureGlobus Genomics
  35. 35. Proprietary and Confidential. ©2013 SyapseSyapse: Bringing Omics in Routine Medical UseLaboratoryTestingTest Results Clinical UseSyapse Semantic DataPlatformSyapse Omics MedicalRecord ApplicationSyapse Physician PortalApplicationSyapse DiscoveryApplicationSyapse
  36. 36. Leverage Spot instances in workflows1 days worth of effortresulted in50% savings in costHarvard Medical SchoolThe Laboratory of Personal MedicineRun EC2 clusters to analyze entiregenomes“The AWS solution is stable, robust, flexible, and low cost. Ithas everything to recommend it.”Dr. Peter Tonellato, LPM, Center for Biomedical Informatics, Harvard Medical School
  37. 37. Illumina BaseSpace• Data Analysis– Alignment, Assembly, QC, Analysis• Share data with colleagues• Access high quality and diverse datasets
  38. 38. We are here to helpEnterprise support Trusted Advisor Professional ServicesSales andSolutions Architects
  39. 39. Thank YouJafar Shameem( Pellerin(