Plenary Talk at ACAT 2010

2,834 views

Published on

Published in: Technology, Travel
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,834
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Plenary Talk at ACAT 2010

  1. 1. Scien&fic  Compu&ng  with  Amazon  Web  Services Deepak  Singh ACAT  2010.    Jaipur,  India
  2. 2. Via Reavel under a CC-BY-NC-ND license
  3. 3. life science industry
  4. 4. Credit: Bosco Ho
  5. 5. By ~Prescott under a CC-BY-NC license
  6. 6. data
  7. 7. Image: Wikipedia
  8. 8. Image: Matt Wood
  9. 9. couldn’t find a good picture for arrays of sensors in the ocean
  10. 10. Image  via  image  editor  under  a  CC-­‐BY  License
  11. 11. years
  12. 12. weeks
  13. 13. days
  14. 14. days
  15. 15. mi nu tes days ?
  16. 16. gigabytes
  17. 17. terabytes
  18. 18. petabytes
  19. 19. petabytes
  20. 20. ex ab y tes petabytes ?
  21. 21. Image: Chris Dagdigian
  22. 22. scale has implications
  23. 23. data management
  24. 24. data processing
  25. 25. data sharing
  26. 26. amazon web services
  27. 27. the cloud
  28. 28. has_many :definitions
  29. 29. infrastructure as a service
  30. 30. Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  31. 31. Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  32. 32. Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for .NET Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  33. 33. Your Custom Applications and Services Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for .NET Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  34. 34. scalable
  35. 35. scalable cost effective
  36. 36. go o u s y scalable ay a P cost effective
  37. 37. scalable cost effective reliable
  38. 38. scalable cost effective reliable secure
  39. 39. Amazon EC2
  40. 40. servers on demand
  41. 41. highly scalable
  42. 42. elastic
  43. 43. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  44. 44. highly available systems
  45. 45. “Everything fails, all the time” -- Werner Vogels
  46. 46. 2.3% AFR in population of 13,250 3.3% AFR in population of 22,400 4.2% AFR in population of 246,000 Source: James Hamilton
  47. 47. assume sw/hw failure
  48. 48. design apps to be resilient
  49. 49. automate & bootstrap
  50. 50. nothing fails
  51. 51. elastic block store
  52. 52. elastic IP
  53. 53. SQS
  54. 54. US East Region Availability Availability Zone A Zone B Availability Availability Zone C Zone D
  55. 55. on-demand instances reserved instances spot instances
  56. 56. data storage
  57. 57. one size does not fit all
  58. 58. Amazon S3
  59. 59. distributed object store
  60. 60. durable
  61. 61. available
  62. 62. !"#$%&'()*+ T T T
  63. 63. scalable
  64. 64. fast
  65. 65. simple
  66. 66. structured data anyone?
  67. 67. Amazon SimpleDB
  68. 68. zero administration
  69. 69. highly available
  70. 70. schema less
  71. 71. key-value store
  72. 72. Amazon Relational Data Service
  73. 73. single API call
  74. 74. MySQL database
  75. 75. automatic backup
  76. 76. scale up with API call
  77. 77. hosted hadoop service
  78. 78. hadoop easy and simple
  79. 79. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  80. 80. apache hive http://hadoop.apache.org/hive/
  81. 81. apache pig http://hadoop.apache.org/pig/
  82. 82. cascading http://www.cascading.org/
  83. 83. computing platforms
  84. 84. http://cyclecomputing.com
  85. 85. sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  86. 86. http://www.rightscale.com
  87. 87. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  88. 88. application platforms
  89. 89. http://heroku.com
  90. 90. software distribution
  91. 91. http://www.cloudbiolinux.com/
  92. 92. http://bitbucket.org/galaxy/galaxy-central/wiki/Home
  93. 93. data distribution
  94. 94. http://aws.amazon.com/publicdatasets/
  95. 95. problem solving
  96. 96. 3.7 million classifications in just over three days ~15 million in less than a month >2.6 million clicks in 100 hours
  97. 97. software & algorithms
  98. 98. Crossbow: Rapid whole genome SNP analysis Preprocessed reads Map: Bowtie Sort: Bin and partition Reduce: SoapSNP Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10 (3): R25.
  99. 99. Crossbow   condenses   over   1,000   hours   of   resequencing   computa:on   into   a   few   hours   without   requiring   the   user   to   own   or   operate   a   computer  cluster
  100. 100. doing science
  101. 101. http://bioteam.net
  102. 102. BLAT @ U. PENN Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days 30 high-memory instances; 32 hours; $195
  103. 103. GALAXY MAPPING Goal: Create an astrometric catalog of a billion stars with micro arc second precision Gaia satellite launched 2011; observations till 2017; catalog ready 2019 Problem: Single pass through the data for image processing would take 30 years (on one CPU) Solution: Use AWS
  104. 104. Capacity Capacity Resources Resources Demand Demand Time Time Static data center Data center in the cloud Unused resources
  105. 105. HEAVY-ION COLLISIONS Problem: Quark matter physics conference imminent but no compute resources handy Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done
  106. 106. BELLE MONTE CARLO Credit: Tom Fifield
  107. 107. AWS for the sciences
  108. 108. available resources
  109. 109. task-based resources
  110. 110. shared dataspaces
  111. 111. new software architectures
  112. 112. new computing platforms
  113. 113. http://aws.amazon.com/education
  114. 114. the cloud works
  115. 115. today
  116. 116. Thank  you! deesingh@amazon.com  Twi<er:@mndoci   Presenta?on  ideas  from  James  Hamilton  and  @lessig

×