High Performance Cloud Computing

3,937 views

Published on

A discussion of high performance computing, including high throughput, tightly coupled, parallel jobs and map/reduce, on Amazon EC2.

Published in: Technology, Business
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,937
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
73
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide
  • Good morning, my name is X, I'm Y for Amazon Web Services, based in Singapore.\nToday we will talk about Cloud Computing, and explain to you why it's important to know about it.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • There has been a lot of interest in cloud computing from the life science community lately. Here are examples of two papers. There have also been articles in Nature\n
  • If you use Facebook, you surely know Farmville, the most successful game so far.\nThe company behind Farmville is Zynga.com.\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Another example, also a hedge fund. Lot more spiky since they do High Frequency Trading\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Map: Catalog K-mers\nEmit k-mers in the genome and reads\n\nShuffle: Collect Seeds\nConceptually build a hash table of k-mers and their occurrences\n\nReduce: End-to-end alignment\nIf read aligns end-to-end with ≤ k errors, record the alignment\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • High Performance Cloud Computing

    1. 1. High Performance Cloud Computing Matt Wood T E C H N O L O G Y E VA N G E L I S T
    2. 2. Hello.
    3. 3. Thank you.
    4. 4. HPC withAMAZON WEB SERVICES
    5. 5. Tightly coupHi gh throughput led, parallel HPC with AMAZON WEB SERVICESData intensive, G etting star ted, map/reduce pr icing & access
    6. 6. Prelude
    7. 7. Infrastructure services
    8. 8. Idea Results
    9. 9. Idea Results Heavy lifting
    10. 10. Scale Redundancy Orchestration 70%Idea Results Heavy liftingCapacity Management Procurement
    11. 11. 30%Idea Results Infrastructure
    12. 12. Idea Results AWS
    13. 13. Idea Results AWS
    14. 14. Compute
    15. 15. Compute Storage
    16. 16. Compute StorageDatabases
    17. 17. Compute Storage ServicesDatabases & Support
    18. 18. Dependable
    19. 19. APICompute Storage ServicesDatabases & SupportFault tolerant infrastructure
    20. 20. APICompute Storage ServicesDatabases & SupportFault tolerant infrastructure
    21. 21. Flexible
    22. 22. Compute Storage ServicesDatabases & Support
    23. 23. Compute Storage ServicesDatabases & Support
    24. 24. CentOS UbuntuRHEL Windows Compute Storage Services Databases & Support
    25. 25. Low cost
    26. 26. PAYG
    27. 27. Gb/month
    28. 28. Compute/hour
    29. 29. Economies of scale
    30. 30. 1 High Throughput Computing
    31. 31. Embarrassingly parallel
    32. 32. Independent tasks
    33. 33. Rapid datageneration
    34. 34. Batch processing
    35. 35. Constraints
    36. 36. Constrained by capacity
    37. 37. More molecules Bigger systemsConstrained by capacity More simulations More dimensions
    38. 38. Constrained by time
    39. 39. Upcoming conference Grant submissionsConstrained by time Impatience! Exploratory “spike” run
    40. 40. Elastic capacity
    41. 41. EC2Elastic Compute Cloud
    42. 42. Instances
    43. 43. Virtualised
    44. 44. Linux20 seconds
    45. 45. Windows A few minutes
    46. 46. ec2-run-instances
    47. 47. mza$ ssh -i web/us-east/aws-web.pemroot@ec2-204-236-247-169.compute-1.amazonaws.comLast login: Wed Jun 22 11:15:20 2011 from 82.26.6.99 __| __|_ ) CentOS _| ( / v5.4 ___|___|___| HVMx64 Welcome to an EC2 Public Image :-)[root@ip-10-17-135-244 ~]#
    48. 48. ec2-terminate-instances
    49. 49. Sandbox
    50. 50. Custom virtual machines
    51. 51. Instance sizes
    52. 52. Small(and micro)
    53. 53. 1.7Gb1. 0GHz RAM Small (and micro)
    54. 54. Large &Extra large
    55. 55. 15Gb4 cores RAM Large & Extra large
    56. 56. 68Gb8 cores RAM High memory & High CPU
    57. 57. Cluster compute
    58. 58. Duel Intel 23Gb R i7 AM gpGPU“Neha lem” Hardware virtualisati on 1.7Tb Fast scratch interconne cts Cluster compute
    59. 59. Rapidprovisioning
    60. 60. 10k in 45 minutes
    61. 61. High scale
    62. 62. Elastic capacity
    63. 63. Capacity Estimated demand Time
    64. 64. Capacity Infrastructure Investment Estimated demand Time
    65. 65. Capacity Infrastructure Real demand Time
    66. 66. Capacity Elastic capacity Real demand Time
    67. 67. Optimise forthroughput
    68. 68. TasksInstances
    69. 69. TasksQueueInstances
    70. 70. TasksQueueInstances
    71. 71. Vertical scale
    72. 72. Tasks Queue Instances Increaseinstance s ize
    73. 73. Tasks Queue Instances Increaseinstance s ize
    74. 74. Horizontal scale
    75. 75. Tasks Queue InstancesIncreaseinstance count
    76. 76. TasksQueueInstancesResultsStore
    77. 77. TasksQueueOn-premiseInstancesResultsStore
    78. 78. TasksQueueOn-premiseInstancesResultsStore
    79. 79. TasksQueueOn-premiseInstancesResultsStore
    80. 80. Optimise for cost Maximise bang for buck
    81. 81. Bang for buck:Instance size
    82. 82. Bang for buck:Monitoring & metrics
    83. 83. Bang for buck:Cost options
    84. 84. On-demand
    85. 85. ReservedInstances
    86. 86. Spot Instances
    87. 87. Built for batch
    88. 88. Persistentrequests
    89. 89. All or nothing
    90. 90. Mix and match
    91. 91. Tools
    92. 92. Wide ecosystem
    93. 93. Oracle Grid Engine
    94. 94. LSF
    95. 95. Condor
    96. 96. Rocks+
    97. 97. MIT StarCluster
    98. 98. Slurm
    99. 99. Useexisting tools
    100. 100. Reuseexisting tools
    101. 101. Reproducibility
    102. 102. http://www.cloudbiolinux.com/
    103. 103. http://usegalaxy.org/cloud
    104. 104. 2 Data Intensive Computing
    105. 105. Big data is a big opportunity
    106. 106. Idea Results
    107. 107. Enter Hadoop
    108. 108. map/reduce
    109. 109. Independentcomputation
    110. 110. Distributed platform
    111. 111. HDFS
    112. 112. High scale > 1000 nodes
    113. 113. PainfulSet up. Configure. Optimise. Run.
    114. 114. ElasticMapReduce
    115. 115. Undifferentiated heavy lifting
    116. 116. S3Input data
    117. 117. S3 Input dataCode Elastic MapReduce
    118. 118. S3 Input dataCode Elastic Name MapReduce node
    119. 119. S3 Input dataCode Elastic Name MapReduce node Elastic cluster
    120. 120. S3 Input dataCode Elastic Name MapReduce node HDFS Elastic cluster
    121. 121. S3 Input dataCode Elastic Name MapReduce node Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    122. 122. S3 Input dataCode Elastic Name Output MapReduce node S3 + SimpleDB Queries HDFS + BI Via JDBC, Pig, Hive Elastic cluster
    123. 123. S3 Input data Elastic OutputMapReduce S3 + SimpleDB
    124. 124. Crossbow: Rapid whole genome SNP analysis Preprocessed reads Map: Bowtie Sort: Bin and partition Reduce: SoapSNP Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.
    125. 125. CloudBurstCatalog k-mers Collect seeds End-to-end alignment http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369
    126. 126. Data location
    127. 127. Generate in the cloud
    128. 128. Upload
    129. 129. Free inbound bandwidth
    130. 130. Import/export
    131. 131. Maximise valuefrom the upload
    132. 132. Public Hosted Datasets
    133. 133. Ensembl
    134. 134. 1000 Genomes
    135. 135. Genbank
    136. 136. 3 Parallel Computing
    137. 137. Tightly coupled
    138. 138. Duel Intel i7 23Gb R AM“Ne halem” gpGPU Hardware vir tualisation 1.7Tb scratch Cluster compute
    139. 139. 10 gig E Cluster compute
    140. 140. Placement groupCluster compute
    141. 141. 250th
    142. 142. 450th
    143. 143. Cores 7040R max 41.82R peak 82.51
    144. 144. GPU
    145. 145. 2 x Tesla 800+ cores
    146. 146. 4 Getting Started
    147. 147. HPC
    148. 148. Accessible
    149. 149. Flexible
    150. 150. Team access
    151. 151. Identity and access
    152. 152. API level rights management
    153. 153. Account
    154. 154. BillingAccount credentials Account MFA
    155. 155. AccountStudent Admin Faculty Post-doc Roles
    156. 156. AccountStudent Admin Faculty Post-doc Roles Sally Robert Users Chris
    157. 157. Security credentials Multifactor authenticationManagement console access Data read/write access API level access
    158. 158. AccountStudent Admin Faculty Post-doc Roles Sally Robert Users Chris
    159. 159. Detailedaccounting
    160. 160. Education grants
    161. 161. JISC
    162. 162. Free tier
    163. 163. HPC primer
    164. 164. HPC primeraws.amazon.com/hpc
    165. 165. Security
    166. 166. Sharedresponsibility
    167. 167. Certified + audited ISO 27001 SAS 70 Type IIHIPPA and FISMA
    168. 168. Data stays local
    169. 169. aws.amazon.com/security
    170. 170. aws.amazon.com
    171. 171. Thank you!
    172. 172. Q U E S T I O N S + C O M M E N T Smatthew@amazon.com @mza O N T W I T T E R

    ×