Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

High Performance Cloud Computing

1,174 views

Published on

Slides from the AWS tutorial at Supercomputing 2011. Related tutorial materials at: cloudsupercomputing.net

Published in: Technology, Business
  • Be the first to comment

High Performance Cloud Computing

  1. 1. HighPerformance Cloud ComputingSupercomputing 2011
  2. 2. Hello
  3. 3. Thank you
  4. 4. HPC withAWS
  5. 5. Understand theservices, tools andpatterns forbuildinghigh performancesystems in the
  6. 6. AGENDA SC11 - Monday 14th November, 2011Cloud ConceptsBuilding BlocksTechnica l & Scientific ComputingL oosely Coupled SystemsHands-on Session #1Parallel ComputationHands-on Session #2Wrap up
  7. 7. There Will Be Code
  8. 8. CloudConceptsA prelude
  9. 9. Consumer Seller business business
  10. 10. Decades of experience Operations, management and scale
  11. 11. Programmatic access
  12. 12. Unexpected innovation
  13. 13. Blinding flash of the obvious
  14. 14. Five years young
  15. 15. Infrastructure services
  16. 16. Compute Storage Placeholder ServicesDatabases & Support
  17. 17. Idea Results
  18. 18. Idea Results Heavy lifting
  19. 19. ScaleRedundancy Orchestrati on 70% Idea Results Heavy liftingCapacity Management Procurement
  20. 20. 30%Idea Results Infrastructure
  21. 21. Idea Results AWS
  22. 22. Idea Results AWS
  23. 23. Five things Iwish I’d knownwhen Iwas gettingstarted.
  24. 24. 1: Signing up
  25. 25. On the web
  26. 26. Free tier For new customers:aws.amazon.com/free
  27. 27. 750 hours of compute10Gb network attached storage5Gb object storage750 hours of computeKey/value store, notifications,messaging
  28. 28. 2: Interacting
  29. 29. HTTP, REST, SOAP
  30. 30. API driven HTTP, REST, SOAP
  31. 31. CLI
  32. 32. ec2-run-instances
  33. 33. ec2-terminate-instances
  34. 34. Java, Python, Ruby, .Net, PHP, iOS and Android
  35. 35. SDKJava, Python, Ruby, .Net, PHP, iOS and Android
  36. 36. Management console
  37. 37. Linux
  38. 38. Certificate-based root access
  39. 39. mza$ ssh -i web/us-east/aws-web.pemroot@ec2-204-236-247-169.compute-1.amazonaws.comLast login: Wed Jun 22 11:15:20 2011 from 82.26.6.99 __| __|_ ) CentOS _| ( / v5.4 ___|___|___| HVMx64 Welcome to an EC2 Public Image :-)[root@ip-10-17-135-244 ~]#
  40. 40. Windows
  41. 41. Administrator access
  42. 42. 3: Storage options
  43. 43. Ephemeral storage
  44. 44. Included with compute Ephemeral storage Lost at Not backedterminatio up n
  45. 45. When it’s gone, it’s gone
  46. 46. Hands-on
  47. 47. Elastic Block StoreHands-on
  48. 48. Network Mount as attached volumeElastic Block StoreSnapshot Persistent
  49. 49. Hands-on
  50. 50. S3Hands-on
  51. 51. Highly Highlydurable available S3 Tolerant to two simultaneo
  52. 52. durability
  53. 53. 99.999999999% durability
  54. 54. Objects in S3Billions of objects 556B 600 450 300 150 0 Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q3 2011
  55. 55. 370,000 peaktransactions per second
  56. 56. Payment options
  57. 57. Pay as you go
  58. 58. Gb/month
  59. 59. ECU/hour
  60. 60. No minimum
  61. 61. No subscriptions
  62. 62. Pricing tiers
  63. 63. Consolidated billing
  64. 64. Options
  65. 65. On-demand
  66. 66. Reservedcapacity
  67. 67. Hands-on
  68. 68. Spot MarketHands-on
  69. 69. Bandwidth
  70. 70. Free inbound
  71. 71. Import/Export
  72. 72. Reducedoutbound
  73. 73. Pricing calculator
  74. 74. aws.amazon.com/calculator
  75. 75. 5. Availability Zones
  76. 76. us-east-1 us-west-1 us-west-2us-gov-west-1 eu-west-1ap-southeast-1ap-northeast-1
  77. 77. eu-west-1aeu-west-1b eu-west-1c
  78. 78. BuildingblocksservicesFoundational
  79. 79. Compute
  80. 80. Elastic Compute Cloud
  81. 81. EC2Elastic Compute Cloud
  82. 82. Hands-on
  83. 83. Elastic compute infrastructureHands-on
  84. 84. ECU:Equivalent to 1.0 - 1.2 GHz 2007 Opteron or 2007 Xeon
  85. 85. ECU:EC2 Compute UnitEquivalent to 1.0 - 1.2 GHz 2007 Opteron or 2007 Xeon
  86. 86. Instance types
  87. 87. ClustMicro er$0.02 $2.10
  88. 88. Standard (m1) 1 ECU. 1.7 Gb memory. 160 Gb ephemeral storage.
  89. 89. High memory (m2)Up to 26 ECU. 8 cores. 68.4 Gb memory. 1.69 Tb ephemeral storage.
  90. 90. High CPU (c1)Up to 20 ECU. 8 cores. 7 Gb memory. 1.69 Tb ephemeral storage.
  91. 91. Higherperformance
  92. 92. MPI workloads
  93. 93. Bandwidth intensive
  94. 94. Hands-on
  95. 95. CC:Cluster ComputeHands-on
  96. 96. 2 x Intel Xeon 557023 Gb memory 1.7 Tb disk 33.5 ECUs
  97. 97. HVM
  98. 98. 10 gig E
  99. 99. Placement groups
  100. 100. Full bisectional bandwidth
  101. 101. Linpack
  102. 102. November 2010Cores 7040 R max 41.82 R peak 82.51
  103. 103. November 2010 231
  104. 104. June 2011451
  105. 105. November 2011
  106. 106. November 2011 42
  107. 107. WIEN2K Parallel Performance H size 56,000 (25GB) Runtime (16x8 processors) Local (Infiniband) 3h:48 Cloud (10Gbps) 1h:30 ($40) 1200 atom unit cell; SCALAPACK+MPI diagonalization, matrix size 50k-100kCredit: K. Jorissen, F. D. Villa, and J. J. Rehr (U. Washington)
  108. 108. GPU computation
  109. 109. Hands-on
  110. 110. CG:Cluster Compute with gpGPUHands-on
  111. 111. 2 x NVIDIA M2050
  112. 112. 2 x Intel Xeon 5570 23 Gb memory 1.7 Tb disk 2 x NVIDIA M2050
  113. 113. Flexible cluster control
  114. 114. API
  115. 115. Hands-on
  116. 116. SGEHands-on
  117. 117. LSF
  118. 118. Condor
  119. 119. Rocks+
  120. 120. Slurm
  121. 121. Included with all instances and block storage
  122. 122. CloudWatchIncluded with all instances and block storage
  123. 123. Custom metrics
  124. 124. Storage
  125. 125. Simple Storage Service
  126. 126. S3Simple Storage Service
  127. 127. Files in directories
  128. 128. Objects in buckets
  129. 129. http://s3.amazonaws.com/bucketname/objectid http://bucketname.s3.amazonaws.com/objectid
  130. 130. https://s3.amazonaws.com/bucketname/objectidhttps://bucketname.s3.amazonaws.com/objectid
  131. 131. 5Tb
  132. 132. Large object support 5Tb
  133. 133. Parallel uploads
  134. 134. Import/Export
  135. 135. Managedencryption
  136. 136. 99.99% durability
  137. 137. Reducedredundancy storage99.99% durability
  138. 138. Elastic Block Store
  139. 139. EBSElastic Block Store
  140. 140. Flexible, off-instance block storage
  141. 141. 1Gb to 1Tb
  142. 142. Scalable1Gb to 1Tb
  143. 143. Exposed as a device
  144. 144. Attached to arunning instance Exposed as a device
  145. 145. Snapshot to S3
  146. 146. Hands-on
  147. 147. Public DatasetsHands-on
  148. 148. Databases
  149. 149. Databases on EC2
  150. 150. Oracle and MySQL
  151. 151. Managed. High availability. Read replicas.
  152. 152. RelationalDatabase ServiceManaged. High availability. Read replicas.
  153. 153. High scale. Highly available. Key/attribute store
  154. 154. SimpleDBHigh scale. Highly available. Key/attribute store
  155. 155. No server toprovision or manage.
  156. 156. Perfect formetadata
  157. 157. Messaging &notifications
  158. 158. Hands-on
  159. 159. Simple Queue ServiceHands-on
  160. 160. Hands-on
  161. 161. Simple Notification ServiceHands-on
  162. 162. Technical &ScientificComputing
  163. 163. Elasticity
  164. 164. Research is bursty
  165. 165. Traditionalcapacity is static
  166. 166. Capacity Predicted capacity Estimated demand Time
  167. 167. Capacity Infrastructure Infrastructure Investment Estimated demand Barrier to entry Time
  168. 168. Capacity Infrastructure Real demand Time
  169. 169. Capacity Elastic capacity Real demand Time
  170. 170. Rapid response
  171. 171. Removingconstraints
  172. 172. Research isconstrained
  173. 173. Constrained by static infrastructure
  174. 174. Unconstrained
  175. 175. Larger systems,more molecules, more stars, higher order species...
  176. 176. Unconstrained by scale
  177. 177. 30,000 cores
  178. 178. Unconstrained by timeUpcoming conference, grant submissions, impatience, exploratory “spike”
  179. 179. 1 core for100 hours
  180. 180. 100 cores for 1 hour
  181. 181. 10k cores in45 minutes
  182. 182. Unconstrained by cost
  183. 183. Optimising for price
  184. 184. On-demand
  185. 185. Reservedcapacity
  186. 186. 1&year&term& Usage Fee One-time Fee Total SavingsOption 1 $1493 - $1493 -On-Demand onlyOption 2 $1008 $227 $1234 ~20%On-Demand +ReservedOption 3 $528 $455 $983 ~35%All reserved Total&Cost&for&1&Year.term&of&2&applica4on&servers,&steady&state&usage&
  187. 187. 3&years&term& Usage Fee One-time Fee Total SavingsOption 1 $4479 - $4479 -On-Demand onlyOption 2 $3024 $350 $3374 ~30%On-Demand +ReservedOption 3 $1584 $700 $2284 ~50%All reserved Total&Cost&for&3&Year.term&of&2&applica4on&servers,&steady&state&usage&
  188. 188. 450" On#Demand# 1*year#RI# 3*year#RI# 400" 350" 300" 250" 200" 2 150" 100" 50" 1 0" 1" 2" 3" 4" 5" 6" 7" 8" 9" 10" 11" 12" 13" 14" 15" 16" 17" 18" 19" 20" 21" 22" 23" 24"on-demand vs. reserved instances
  189. 189. Spot InstancesHands-on
  190. 190. Placeholder
  191. 191. On-demand + Reserved + Spot
  192. 192. “20th Centuryarchitectures”
  193. 193. Driven by analysis metrics
  194. 194. Increasing usability
  195. 195. AmazonMachine Image
  196. 196. Community AMIs
  197. 197. http://www.cloudbiolinux.com/
  198. 198. http://usegalaxy.org/cloud
  199. 199. Reproducibility
  200. 200. Detailedlogging For S3 access
  201. 201. Application loggingArchive to S3 for durability
  202. 202. Automation
  203. 203. Application tierCode Configuration
  204. 204. Application tierCode Configuration
  205. 205. Application tier Code Configuration Service tier Integration Operating system settings Services +Launch configuration configuration
  206. 206. Application tier Code Configuration Service tier Integration Operating system settings Services +Launch configuration configuration
  207. 207. Application tier Code Configuration Service tier Integration Operating system settings Services +Launch configuration configuration Infrastructure tier AMIs Architecture Multi-AZScaling rules Security groups Middleware
  208. 208. Value bakedinto each tier
  209. 209. Service tierConfiguration & optimization Technology choices
  210. 210. Infrastructure tierArchitecture. Configuration.
  211. 211. Automationmaximises this value
  212. 212. CloudFormationHands-on
  213. 213. Template
  214. 214. TemplateDefines a full infrastructure stack
  215. 215. Auto-scaling RDS EC2 SNS SimpleDB EBS SQS ResourcesElastic Beanstalk CloudWatch Security groups Tags
  216. 216. Template CloudFormation Provisioned resources
  217. 217. Complete definitionAtomic, idempotent provisioning.
  218. 218. JSONDeclarative language
  219. 219. { "AWSTemplateFormatVersion" : "2010-09-09", "Description" : "Create an EC2 instances", "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-76f0061f" }, "us-west-1" : { "AMI" : "ami-655a0a20" }, "eu-west-1" : { "AMI" : "ami-7fd4e10b" }, "ap-southeast-1" : { "AMI" : "ami-72621c20" }, "ap-northeast-1" : { "AMI" : "ami-8e08a38f" } } }, "Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } }, "Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicIP" : { "Description" : "Public IP address of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicIp" ] } } }}
  220. 220. { "AWSTemplateFormatVersion" : "2010-09-09", "Description" : "Create an EC2 instances", Headers Parameters "Parameters" : { "KeyName" : { "Description" : "Name of an existing EC2 KeyPair to enable SSH access to the instance", "Type" : "String" } }, "Mappings" : { "RegionMap" : { "us-east-1" : { "AMI" : "ami-76f0061f" }, "us-west-1" : { Mappings "AMI" : "ami-655a0a20" }, "eu-west-1" : { "AMI" : "ami-7fd4e10b" }, "ap-southeast-1" : { "AMI" : "ami-72621c20" }, "ap-northeast-1" : { "AMI" : "ami-8e08a38f" } } }, "Resources" : { "Ec2Instance" : { "Type" : "AWS::EC2::Instance", Resources "Properties" : { "KeyName" : { "Ref" : "KeyName" }, "ImageId" : { "Fn::FindInMap" : [ "RegionMap", { "Ref" : "AWS::Region" }, "AMI" ]}, "UserData" : { "Fn::Base64" : "80" } } } }, "Outputs" : { "InstanceId" : { "Description" : "InstanceId of the newly created EC2 instance", "Value" : { "Ref" : "Ec2Instance" } }, Outputs "AZ" : { "Description" : "Availability Zone of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "AvailabilityZone" ] } }, "PublicIP" : { "Description" : "Public IP address of the newly created EC2 instance", "Value" : { "Fn::GetAtt" : [ "Ec2Instance", "PublicIp" ] } } }}
  221. 221. BootstrapHands-on
  222. 222. Chef & Puppet
  223. 223. Hands-on
  224. 224. Elastic MapReduceHands-on
  225. 225. Hadoop for dataintensive analytics
  226. 226. Painful at scale
  227. 227. S3Input data
  228. 228. S3 Input dataCode Elastic MapReduce
  229. 229. S3 Input dataCode Elastic Name MapReduce node
  230. 230. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
  231. 231. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
  232. 232. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  233. 233. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
  234. 234. S3 Input data Elastic OutputMapReduce S3 + SimpleDB
  235. 235. Hands-on
  236. 236. SpotHands-on
  237. 237. Enablingcollaboration
  238. 238. Data
  239. 239. Lots of data
  240. 240. Lots of data,lots of uses
  241. 241. Lots of data,lots of uses,lots of users
  242. 242. Lots of data, lots of uses, lots of users,lots of locations
  243. 243. Forcemultipliers
  244. 244. Maximise value of data
  245. 245. Data close to compute
  246. 246. Import/Export
  247. 247. Direct Connect
  248. 248. AMIs, Snapshots,CloudFormationHands-on
  249. 249. Public DatasetsHands-on
  250. 250. Ensuringsecurity
  251. 251. Sharedresponsibility
  252. 252. Requirementbased access
  253. 253. Certification
  254. 254. ISO 27001
  255. 255. SAS70 Type II
  256. 256. ServiceOrganisationControls (SOC 1) SSAE 16 and ISAE 3702
  257. 257. FISMA Moderate
  258. 258. HIPAA
  259. 259. ITARAWS GovCloud (US)
  260. 260. Data access control Detailed logging
  261. 261. Data stays local
  262. 262. Identity &Access ControlHands-on
  263. 263. Account
  264. 264. AccountDBA Developer Sys admin Finance Roles
  265. 265. AccountDBA Developer Sys admin Finance Roles Sally Robert Users Chris
  266. 266. Security credentials Multifactor authenticationManagement console access Data read/write access API level access
  267. 267. AccountDBA Developer Sys admin Finance Roles Sally Robert Users Chris
  268. 268. Networking controls
  269. 269. Virtual Private Cloud
  270. 270. Virtual network topology
  271. 271. IP address rangePublic and private subnetsRouting tablesNetwork gateways
  272. 272. Network access control
  273. 273. Inbound ACLsOutbound ACLsIPsec VPN
  274. 274. Public subnetPublic facing website
  275. 275. Public subnet Network ACLs + security groups Private subnetMulti-tier applications
  276. 276. Public subnet Private subnet IPsec VPN On-premiseExtend your data centre
  277. 277. Private subnet IPsec VPN On-premiseExtend your data centre
  278. 278. aws.amazon.com/security
  279. 279. End of Part One
  280. 280. cloudsupercomputing.net/ tutorial
  281. 281. aws.amazon.com/ awscredits
  282. 282. Part Two
  283. 283. Hands-on:Loosely coupledsystems
  284. 284. High scale,loosely coupled system
  285. 285. Embarrassingly parallel
  286. 286. Decoupled, batchworkflows
  287. 287. TasksInstances
  288. 288. TasksQueueInstances
  289. 289. TasksQueueInstances
  290. 290. Tasks Queue Instances Increaseinstance size
  291. 291. Tasks Queue Instances Increaseinstance size
  292. 292. Tasks Queue Instances Increaseinstance count
  293. 293. TasksQueueInstancesResultsStore
  294. 294. TasksQueueOn-premiseInstancesResultsStore
  295. 295. TasksQueueOn-premiseInstancesResultsStore
  296. 296. TasksQueueOn-premiseInstancesResultsStore
  297. 297. Batch processingMonitoring. Auto-scaling. Queuing. Spot. Automation.
  298. 298. Configure
  299. 299. 150
  300. 300. Autoscaling. Automation.Don’t forget to shut down your instances!
  301. 301. Hadoop with Elastic MapReduceNative. Streaming interface. Hive. Spot with EMR.
  302. 302. Advanced EMR with MyrnaBioinformatics tools and large datasets. Thanks to Ben Langmead.
  303. 303. $100
  304. 304. CredentialsAccount -> Security CredentialsAccess key, secret key, account number
  305. 305. AWS staff
  306. 306. cloudsupercomputing.net/ tutorial
  307. 307. Hands-on:ParallelComputation
  308. 308. Tightly coupled systems
  309. 309. 10 gig E
  310. 310. 64 core parallel clusterCC1. Custom AMI. EBS. Monitoring. MIT StarCluster. CloudFormation.
  311. 311. Multi-GPUCG1. CUDA 4. Compile & execute. Benchmark against CPU.
  312. 312. OpenFOAMComputational Fluid Dynamics with CC1 on EC2.
  313. 313. cloudsupercomputing.net/ tutorial
  314. 314. AGENDA SC11 - Monday 14th November, 2011Cloud ConceptsBuilding BlocksTechnica l & Scientific ComputingL oosely coupled systemsHands-on Session #1Parallel computationHands-on Session #2Wrap up, drinks
  315. 315. Understand theservices, tools andpatterns forbuildinghigh performancesystems in the
  316. 316. YOU ARE CORDIALLY INVITED TO THEAmazon Web Services S C 11 B A S H NETWORKING, DRINKS and GOODIES BOOTH #6202
  317. 317. aws.amazon.com/about-aws/sc11
  318. 318. Thank you!
  319. 319. Questions & comments:matthew@amazon.co m @mza on Twitter

×