Bio-IT World 2010 - Keynote talk

3,308 views
3,218 views

Published on

My keynote from Bio-IT World 2010

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,308
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Bio-IT World 2010 - Keynote talk

  1. 1. There  is  no  magic,  there  is  only  awesome Scien&fic  compu&ng  with  Amazon  Web  Services Deepak  Singh Bio-­‐IT  World  2010
  2. 2. By ~Prescott under a CC-BY-NC license
  3. 3. <1>
  4. 4. data
  5. 5. Image: Wikipedia
  6. 6. Source: http://www.nature.com/news/specials/bigdata/index.html
  7. 7. implications
  8. 8. data management data processing data sharing
  9. 9. <2>
  10. 10. !"#$%&'()*+',-./0) 1%$.#'$$)
  11. 11. !"#$%&'()*+',-./0) 1%$.#'$$) !"##"$% &'()*"((%
  12. 12. massive scale highly available efficient service oriented secure Amazon Infrastructure
  13. 13. infrastructure as a service
  14. 14. undifferentiated heavy lifting
  15. 15. *+,-./-01$23,4567-$89$ :6+;3,$<78$=7>?/@74$ *+,-./-01$23,4567-$89$ :6+;3,A4$BC38+C$<784/074$ !""#$ !""!$ !""%$ !""&$ !""'$ !""($ !"")$ !""D$
  16. 16. Amazon S3 Momentum 102 Billion Peak Requests: 40 Billion 14 Billion 2.9 Billion !"#$%%&# !"#$%%'# !"#$%%(# !"#$%%)# Total Number of Objects Stored in Amazon S3
  17. 17. Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  18. 18. Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  19. 19. Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  20. 20. Your Custom Applications and Services Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  21. 21. scalability
  22. 22. > 1PB of data in S3
  23. 23. elasticity
  24. 24. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  25. 25. highly available
  26. 26. “Everything fails, all the time” -- Werner Vogels
  27. 27. 2-4% of servers & 1-5% of disk drives will die annually Source: Jeff Dean, LADIS 2009
  28. 28. human errors
  29. 29. human errors ~20% admin issues have unintended consequences Source: James Hamilton
  30. 30. scalable & available
  31. 31. assume sw/hw failure
  32. 32. design apps to be resilient
  33. 33. automation & alarming
  34. 34. Image: Chris Dagdigian
  35. 35. US East Region !"#$%&'()*+ T T Availability Availability Zone A Zone B Availability Availability T Zone C Zone D
  36. 36. elastic load balancing CloudWatch auto scaling SNS SQS elastic IP elastic block store
  37. 37. "#$%&!'()*+#,$! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!( + ( +! +! -((./!'()*+#,$! ).#,$!0)/)/.! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!(
  38. 38. cost effective pay as you go economies of scale choices in pricing
  39. 39. on-demand instances reserved instances spot instances
  40. 40. on-demand instances !"#$"%$#&'$(&$ %)"*+,($")$-./.0123&'*$ 45%)$56&*)$7"!"78)#$
  41. 41. reserved instances !"#$%&$'(")*$'++$ !"#+($"&+(,-).$/"0*$ .%,(,)*++1$/,&,/2*3$
  42. 42. spot instances !"#$%&$'&'()#$*+,+*"-.$ ,/"*)$!+()#$%&$(',,0.$+&#$#)1+&#$ .%'$,+.$1+/2)-$,/"*)$ "&(-+&*)($*+&$#"(+,,)+/$
  43. 43. http://cloudexchange.com
  44. 44. physical is free network is easy rest can be added
  45. 45. Customer 1 Customer 2 !" Customer n Hypervisor Virtual Interfaces Customer 1 Security Groups Customer 2 Security Groups ! Customer n Security Groups Firewall Physical Interfaces
  46. 46. A MAZON   V PC   A RCHITECTURE Customer’s isolated AWS resources 10 .32 . 2. 0 /24 Subnets 10 10 .32 .32 .1.0 . 3. 0 /24 /24 VPN Gateway Amazon Web Services Cloud Secure VPN Connection over the Internet External Your Network Customers
  47. 47. http://aws.amazon.com/security
  48. 48. <3>
  49. 49. AWS + science
  50. 50. data management
  51. 51. Biomarker Warehouse pre-clinical, clinical, 3rd party data and publications ;<./5'=>?6@' !)*(%"&&' 23,3415'61789:1' !#%&$(%&&&' +,'-./01' !"#$%"&&' 6178170' 6A.7341' B817-135' Estimated cost: 10 TB warehouse over 3 years
  52. 52. data processing
  53. 53. http://cyclecomputing.com
  54. 54. sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  55. 55. http://www.rightscale.com
  56. 56. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  57. 57. Protein interactions @ U. Washington EC2! Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API http://faculty.washington.edu/danielt/ Source: Ed Lazowska
  58. 58. BLAT @ U. Penn Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days :..2#>*120-#D#1+*@01# $%# C*120-# !67>(2#2A0#:;<=#9.7# E,+.*?#(',621# !+*@0# &'()*+#,-./011# 1,+(21#3*12*#4+05# F.G'+.*?#-016+21# !6710860'2#9.71# !+*@0# :;<=#1>*++0-#4+01# !"# *'?#1*@0#0*/A# -016+2#*1#(2#B.01# !+*@0# !+*@0# 30 high-memory instances; 32 hours; $195 Source: Angel Pizzaro/John Hogenesch
  59. 59. 200 instances 60000 structures 4 hours http://bioteam.net/aws
  60. 60. Crossbow: Rapid whole genome SNP analysis Preprocessed reads Map: Bowtie Sort: Bin and partition Reduce: SoapSNP Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.
  61. 61. data storage & distribution public & private
  62. 62. http://aws.amazon.com/publicdatasets/
  63. 63. sharing and collaboration
  64. 64. http://www.elasticr.net Elastic-R Collaborative Research Environment
  65. 65. software/pipeline distribution
  66. 66. http://www.cloudbiolinux.com/
  67. 67. http://usegalaxy.org/cloud
  68. 68. http://clovr.igs.umaryland.edu/
  69. 69. platforms & services
  70. 70. http://heroku.com
  71. 71. http://chempedia.com/
  72. 72. http://dnanexus.com
  73. 73. NexusDB NexusDB NexusDB Storage Server (S3) SSL NexusDB https Client http://www.biodiscovery.com/index/biod-nexusdb-action
  74. 74. Cloud-based biosemantic apps http://syapse.com/
  75. 75. <4>
  76. 76. 2006 comparison. Large Service vs. Mid Size Source: James Hamilton
  77. 77. datacenter design efficiency PUE < 1.2 - 1.5* Source: James Hamilton * average > 2.0 (Source: EPA)
  78. 78. multiple datacenters Source: James Hamilton
  79. 79. h/w costs efficiency optimization Source: James Hamilton
  80. 80. investments in automation investments in s/w investments in special skills
  81. 81. server utilization infrastructure as opex pay as you go/grow
  82. 82. • Boot from EBS • AWS Multi Factor Authentication • US West Region • Virtual Private Cloud private beta • VPC Unlimited Beta • Lower Reserved Instance Pricing • ELB Support in Console • Reserved Instances in EU • Console Support for CloudWatch • CloudFront streaming • Elastic MapReduce • SQS in EU • EC2 Spot Instances • Windows 2008 Support • RDS Launched • Lowered Prices • New SimpleDB Features •  AWS Security Center • High Memory Instances • AWS Economics Center • Console support for Cloudfront • Reduced EC2 Pricing • FPS General Availability • EMR Apache Hive support • EC2 Reserved Instances • Elastic MapReduce in EU • SAS 70 Type II Audit • EC2 with Windows • AWS SDK for .NET • EC2 in EU • CloudFront Private Content • AWS Toolkit for Eclipse • EBS Shared Snapshots • APAC announced • SimpleDB in EU • Monitoring in EU • AWS Import/Export • Auto Scaling in EU • Lower pricing tiers for Cloudfront • Elastic Load Balancing in EU • Monitoring, Auto Scaling, • AWS Management Console • AWS Solutions Provider program and Elastic Load Balancing • CloudFront adds access logging
  83. 83. there is no magic there is only awesome
  84. 84. Thank  you! deesingh@amazon.com  Twi;er:@mndoci   Presenta?on  ideas  from  James  Hamilton,  @mza  and  @lessig

×