Talk given at "Cloud Computing for Systems Biology" workshop

3,284 views
3,232 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,284
On SlideShare
0
From Embeds
0
Number of Embeds
45
Actions
Shares
0
Downloads
96
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Talk given at "Cloud Computing for Systems Biology" workshop

  1. 1. The  role  of  cloud  compu.ng  in  big  biology Deepak  Singh
  2. 2. Via Reavel under a CC-BY-NC-ND license
  3. 3. life science industry
  4. 4. Credit: Bosco Ho
  5. 5. By ~Prescott under a CC-BY-NC license
  6. 6. context
  7. 7. analysis methods
  8. 8. technology
  9. 9. ? ? technology ? ?
  10. 10. back of the room
  11. 11. technology technology technology technology
  12. 12. technology tec y hn o log olo hn gy c te technology technology y nolog tech gy nolo technology tech
  13. 13. Image: Keith Allison under a CC-BY-SA license
  14. 14. inherent characteristics
  15. 15. data driven
  16. 16. multi-dimensional
  17. 17. collaborative
  18. 18. distributed
  19. 19. <amazon web services>
  20. 20. the cloud
  21. 21. has_many :definitions
  22. 22. infrastructure as a service
  23. 23. precursors
  24. 24. virtualization
  25. 25. service oriented architecure
  26. 26. distributed computing
  27. 27. Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  28. 28. Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  29. 29. Isolated Networks Monitoring Management Tools Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for Eclipse Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  30. 30. Your Custom Applications and Services Isolated Networks Monitoring Management Tools Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS Toolkit for Eclipse Cloud Payments On-Demand Parallel Processing Messaging Content Delivery Amazon Flexible Workforce Amazon Elastic Amazon Simple Amazon CloudFront Payments Service Amazon Mechanical MapReduce Queue Service (SQS) (FPS) Turk Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  31. 31. scalable
  32. 32. scalable cost effective
  33. 33. go o u s y scalable ay a P cost effective
  34. 34. scalable cost effective reliable
  35. 35. scalable cost effective reliable secure
  36. 36. Amazon EC2
  37. 37. servers on demand
  38. 38. highly scalable
  39. 39. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  40. 40. design for failure
  41. 41. “Everything fails, all the time” -- Werner Vogels
  42. 42. assume failure
  43. 43. assume failure design backwards
  44. 44. assume failure design backwards nothing fails
  45. 45. highly available systems
  46. 46. elastic block store
  47. 47. elastic IP
  48. 48. SQS
  49. 49. US East Region Availability Availability Zone A Zone B Availability Availability Zone C Zone D
  50. 50. data storage
  51. 51. one size does not fit all
  52. 52. Amazon S3
  53. 53. distributed object store
  54. 54. durable
  55. 55. available
  56. 56. !"#$%&'()*+ T T T
  57. 57. scalable
  58. 58. fast
  59. 59. simple
  60. 60. structured data anyone?
  61. 61. Amazon SimpleDB
  62. 62. zero administration
  63. 63. highly available
  64. 64. schema less
  65. 65. key-value store
  66. 66. Amazon Relational Data Service
  67. 67. single API call
  68. 68. MySQL database
  69. 69. automatic backup
  70. 70. scale up with API call
  71. 71. e s ur t fu
  72. 72. e s ur t fu master-slave replication data center failover
  73. 73. what do people do?
  74. 74. solve problems
  75. 75. > 1PB of data in S3
  76. 76. provide platforms & services
  77. 77. Platform as a Service http://heroku.com
  78. 78. Computation as a Service http://cyclecomputing.com
  79. 79. Computational Platforms sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  80. 80. http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  81. 81. they do science Image: Matt Wood
  82. 82. 3.7 million classifications in just over three days ~15 million in less than a month >2.6 million clicks in 100 hours
  83. 83. Image  via  image  editor  under  a  CC-­‐BY  License
  84. 84. Protein Docking @ Pfizer http://bioteam.net
  85. 85. http://aws.amazon.com/publicdatasets/
  86. 86. </amazon web services>
  87. 87. anecdote
  88. 88. collaborative project
  89. 89. 800 GB
  90. 90. Image: Wikipedia Commons
  91. 91. weeks to get started
  92. 92. Image: Matt Wood
  93. 93. Image: Chris Dagdigian
  94. 94. gigabytes
  95. 95. terabytes
  96. 96. petabytes
  97. 97. really fast
  98. 98. constant flux
  99. 99. Image: Chris Dagdigian
  100. 100. data management is not data storage
  101. 101. masterclass Big data & Biology: The implications of petascale science Tuesday November 17 1:30PM - 3:00PM Room: PB253-254-257-258
  102. 102. “science data platform”
  103. 103. deliver data to applications
  104. 104. deliver data to people
  105. 105. typical informatics workflow
  106. 106. Via Christolakis under a CC-BY-NC-ND license
  107. 107. Via Argonne National Labs under a CC-BY-SA license
  108. 108. p p r a il le k Via Argonne National Labs under a CC-BY-SA license
  109. 109. Da ta Ap ps
  110. 110. Data Platform App Platform
  111. 111. Data Platform App Platform
  112. 112. Data Platform App Platform
  113. 113. Data Platform data services
  114. 114. application services App Platform
  115. 115. Scalable Data Platform Services APIs Getters Filters Savers WORK
  116. 116. must accommodate change
  117. 117. must scale
  118. 118. highly available
  119. 119. loosely coupled
  120. 120. dynamic
  121. 121. task-based resources
  122. 122. one project one set of resources
  123. 123. no waiting
  124. 124. Protein Docking @ Pfizer http://bioteam.net
  125. 125. distributed mindset
  126. 126. one approach
  127. 127. disk read/writes slow & expensive
  128. 128. data processing fast & cheap
  129. 129. distribute data parallelize reads
  130. 130. map/reduce
  131. 131. distributed data processing at scale
  132. 132. abstracting away hadoop
  133. 133. apache hive http://hadoop.apache.org/hive/
  134. 134. apache pig http://hadoop.apache.org/pig/
  135. 135. cascading http://www.cascading.org/
  136. 136. hosted hadoop service
  137. 137. hadoop easy & simple
  138. 138. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  139. 139. developers develop & distribute
  140. 140. scientists/analysts consume
  141. 141. CloudBurst Catalog k-mers Collect seeds End-to-end alignment
  142. 142. Mike Schatz, University of Maryland
  143. 143. Scalable Data Platform Services APIs Getters Filters Savers WORK
  144. 144. IN CONCLUSION
  145. 145. large scale biology
  146. 146. complex multidimensional data
  147. 147. whole lot of data
  148. 148. distributed collaborations
  149. 149. new computing and data architectures
  150. 150. a solution: cloud services
  151. 151. distributed
  152. 152. scalable
  153. 153. economical
  154. 154. here today
  155. 155. Thank  you! deesingh@amazon.com  Twi<er:@mndoci   Presenta?on  ideas  from  @mza,  James  Hamilton,  and  @lessig

×