Scientific Computing With Amazon Web Services

1,429 views

Published on

Researchers from around the world are increasingly using AWS for a wide-array of use cases. This presentation describes how AWS facilitates scientific collaboration and powers some of the world's largest scientific efforts, including real-world examples from NASA JPL, the European Space Agency (ESA) and CERN's CMS particle detector.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,429
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Scientific Computing With Amazon Web Services

  1. 1. Scien&fic  Compu&ng  on  AWS:NASA/JPL,  ESA  and  CERNJamie KinneyPrincipal Solutions ArchitectWorld Wide Public Sectorjkinney@amazon.com@jamiekinney1
  2. 2. ?How do researchers use AWS today?Can you run HPC on AWS?Should everything run on the cloud?How does AWS facilitate scientific collaboration?2
  3. 3. AmazonWebServicesAWS Global InfrastructureApplication ServicesNetworkingDeployment & AdministrationDatabaseStorageCompute3
  4. 4. Amazon EC24
  5. 5. ec2-run-instances5
  6. 6. 6
  7. 7. Programmable7
  8. 8. 8
  9. 9. 9
  10. 10. Elastic10
  11. 11. Self HostingWasteCustomerDissatisfactionActual demandPredictedDemandRigidActualdemandElastic11
  12. 12. Gofromoneinstance...12
  13. 13. ToThousands13
  14. 14. Instance Types14
  15. 15. Standard (m1)High Memory (m2,m3)High CPU (c1)15
  16. 16. Intel Nehalem (cc1.4xlarge)Nvidia GPUs (cg1.4xlarge)2TB of SSD 120,000 IOPS (hi1.4xlarge)Intel Sandy Bridge E5-2670 (cc2.8xlarge)Sandy Bridge, NUMA, 240GB RAM (cr1.4xlarge)48 TB of ephemeral storage (hs1.8xlarge)Cluster Compute16
  17. 17. 17
  18. 18. Placement Groups18
  19. 19. 10 gig EPlacementGroupFullBisectionEC2EC2EC2EC2 EC2 EC2EC2EC2EC219
  20. 20. What is ScientificComputing?20
  21. 21. UseCases•Science-as-a-Service•Large-scale HTC (100,000+ core clusters)•Large-scale MapReduce (Hadoop/Spark/Shark) using EMR or EC2•Small to medium-scale MPI clusters (hundreds of nodes)•Many small MPI clusters working in parallel to explore parameter space•GPGPU workloads•Dev/test of MPI workloads prior to submitting to supercomputing centers•Collaborative research environments•On-demand academic training/lab environments21
  22. 22. Large Input Data Sets22
  23. 23. ESAGaiaMissionOverviewESA’s Gaia is an ambitious mission to chart a three-dimensionalmap of the Milky Way Galaxy in order to reveal the composition,formation and evolution of our Galaxy.Gaia will repeatedly analyze and record the positions andmagnitude of approximately one billion stars over the course ofseveral years.1 billion stars x 80 observations x 10 readouts = ~1 x 10^12samples.1ms processing time/sample = more than 30 years of processing23
  24. 24. GaiaSolutionOverview• Purchase at the beginning of the mission for the anticipated high-water mark• Pay as you go: Launch what you need, as you need it. Turn instances off when you’re done• Purchase additional systems for redundancy• If an instance fails, turn it off and launch a replacement at no additional charge• Large-scale data reprocessing is constrained to available infrastructure. No way to accelerate jobswithout additional CapEx• Need to reprocess the data within a few hours, simply launch more instances. 100 machines runningfor 1 hour at the same cost as 1 machine running for 100 hours• Performance constrained to processor/disk/memory available at time of procurement...for a multi-year mission• AWS frequently launches new instance types running the latest hardware. Simply restart yourinstances on a newer instance type and stop paying for less-capable infrastructure.• Data transfer and security policies make it difficult to collaborate with researchers located elsewhere• Easily and securely collaborate with researchers around the world24
  25. 25. Many Iterations With Varying Parameters25
  26. 26. Linear Algebra Calculations26
  27. 27. 27
  28. 28. JPLPasadena, CACDSCCCanberra Deep SpaceCommunication ComplexMDSCCMadrid Deep SpaceCommunication ComplexGDSCCGoldstone Deep SpaceCommunication ComplexARCCheMinMoffett Field, CAMSSSMARDI, MAHLI,MastCamSan Diego, CAKSCIKIDANMoscow, RussiaINTAREMSMadrid,SpainLANLChemCamLos Alamos, NMUofGuelphAPXSGuelph, OntarioSwRIRADBoulder, COGSFCSAMGreenbelt, MDPlus hundreds of othersites around the world forCo-Is and ColleaguesMSL Distributed Operations28
  29. 29. Data Locality ChallengesScientist 1 retrieves data from L.A.Scientist 1 returns data to L.A.Scientist 2 retrieves data from L.A.Scientist 2 returns data to L.A.29
  30. 30. AWSGlobalInfrastructure9 regions25 availability zones38 edge locations30
  31. 31. AWS Public Data SetsAWS.amazon.com/datasets 31
  32. 32. Data Locality ChallengesResearcher in L.A. uploadsdata to the cloudScientist 1 uses cloudresources to process dataScientist 2 retrieves dataproducts from edge networkScientist 2 uses cloud resourcesto process dataGlobal collaboration32
  33. 33. 33
  34. 34. On-DemandPricing34
  35. 35. ReservedInstances35
  36. 36. SpotInstances• Bid $X per hour• If current price <= bid, instance starts• If current price > bid, instance terminates• Customers pay market rate, not bid36
  37. 37. U. Wisc.: CMS Particle Detectorhttp://www.hep.wisc.edu/~dan/talks/EC2SpotForCMS.pdf37
  38. 38. IntegratedArchitectures38
  39. 39. AmazonVPCAWS DirectConnectEC2 EC2EC2EC2Los AngelesSingaporeJapanLondonSao PaoloNew YorkSydney39
  40. 40. 40
  41. 41. Secured Uplink Planning41
  42. 42. JPL Data CenterDeciderFileTransferWorkersDataProcessingWorkersPolyphonyAmazon SWFDeciderData ProcessingTasksFile TransferTasksDecision TasksCreate EC2InstancesUpload andDownloadFile ChunksData Processing WorkersEC2 EC2 EC2 EC2S342
  43. 43. SWFEC2S3SimpleDBCloudWatchIAMsELB5 Giga-pixels in 5 minutes!43
  44. 44. EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2EC2AmesLarge, tightly-coupled MPILarge EP, smaller scale tightly-coupledMPI, dev/test, burst capacitySmall scaleMPI and EPNASAResearcher44
  45. 45. 45
  46. 46. 46
  47. 47. ZerotoInternet-ScaleinOneWeek!47
  48. 48. ELBs on Steroids48
  49. 49. Route5349
  50. 50. CloudFormation50
  51. 51. CloudFront51
  52. 52. Regions and AZs52
  53. 53. MarsScienceLaboratory-LiveVideoStreamingArchitectureAvailability Zone: us-east-1aAdobeFlashMediaServerAvailability Zone: us-west-1bTelestreamWirecastCloudFrontstreaming formuseum partnersAdobeFlashMediaServerElastic LoadBalancerTier 2 NginxCacheTier 1NginxCacheCloud Formation StackElastic LoadBalancerTier 2 NginxCacheTier 1NginxCacheCloud Formation Stack53
  54. 54. BattleTestingJPL’sDeploymentBenchmarking54
  55. 55. DynamicTrafficScalingUS-EastCacheNodePerformance11.4 Gbps55
  56. 56. DynamicTrafficScalingUS-EastCacheNodePerformance25.3 Gbps56
  57. 57. DynamicTrafficScalingUS-EastCacheNodePerformance10.1 Gbps57
  58. 58. DynamicTrafficScalingUS-EastCacheNodePerformance40.3 Gbps58
  59. 59. DynamicTrafficScalingUS-EastCacheNodePerformance26.6 Gbps59
  60. 60. Only ~42MbpsDynamicTrafficScalingImpactonUS-EastFMSOriginServers60
  61. 61. Only ~42MbpsDynamicTrafficScalingImpactonUS-EastFMSOriginServers61
  62. 62. CloudFrontBehaviorsUsingELBsforDynamicContent62
  63. 63. AWS Academic GrantsAWS.amazon.com/grants 63
  64. 64. ThankYou64

×