There	
  is	
  no	
  magic,	
  there	
  is	
  only	
  awesome
Scien&fic	
  compu&ng	
  with	
  Amazon	
  Web	
  Services
De...
By ~Prescott under a CC-BY-NC license
<1>
data
Image: Wikipedia
Source: http://www.nature.com/news/specials/bigdata/index.html
implications
data management
 data processing
   data sharing
<2>
!"#$%&'()*+',-./0)
    1%$.#'$$)
!"#$%&'()*+',-./0)
    1%$.#'$$)




     !"##"$%
    &'()*"((%
massive scale
highly available
    efficient
service oriented
     secure
            Amazon Infrastructure
infrastructure as a service
undifferentiated heavy lifting
*+,-./-01$23,4567-$89$
                              :6+;3,$<78$=7>?/@74$




                *+,-./-01$23,4567-$89$
     ...
Amazon S3 Momentum

                                                           102 Billion

     Peak Requests:




      ...
Compute                          Storage
    Amazon Elastic Compute                                   Database
           ...
Messaging
                                                      Amazon Simple            Payments          On-Demand
Paral...
Tools                  Isolated Networks
         Monitoring                    Management
                               ...
Your Custom Applications and Services

                                                                            Tools  ...
scalability
> 1PB of data in S3
elasticity
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                          ...
highly available
“Everything fails, all the time”
                   -- Werner Vogels
2-4% of servers & 1-5% of
                disk drives will die annually



Source: Jeff Dean, LADIS 2009
human errors
human errors
          ~20% admin issues have unintended consequences




Source: James Hamilton
scalable & available
assume sw/hw failure
design apps to be resilient
automation & alarming
Image: Chris Dagdigian
US East Region               !"#$%&'()*+


                                T                 T
Availability     Availabili...
elastic load balancing


                           CloudWatch
auto scaling

                              SNS
           ...
"#$%&!'()*+#,$!           !"#$%"&&'%()(         !"#$%"&&'%(*(         !"#$%"&&'%(!(


                  +   (             ...
cost effective

     pay as you go
  economies of scale
   choices in pricing
on-demand instances
 reserved instances
   spot instances
on-demand instances

       !"#$"%$#&'$(&$
   %)"*+,($")$-./.0123&'*$
    45%)$56&*)$7"!"78)#$
reserved instances

    !"#$%&$'(")*$'++$
  !"#+($"&+(,-).$/"0*$
   .%,(,)*++1$/,&,/2*3$
spot instances

     !"#$%&$'&'()#$*+,+*"-.$
,/"*)$!+()#$%&$(',,0.$+&#$#)1+&#$
      .%'$,+.$1+/2)-$,/"*)$
     "&(-+&*)($...
http://cloudexchange.com
physical is free

 network is easy

rest can be added
Customer 1        Customer 2            !"      Customer n



                            Hypervisor

                    ...
A MAZON	
   V PC 	
   A RCHITECTURE

                                                                 Customer’s isolated
...
http://aws.amazon.com/security
<3>
AWS + science
data management
Biomarker Warehouse
pre-clinical, clinical, 3rd party data and publications

              ;<./5'=>?6@'               !)*(...
data processing
http://cyclecomputing.com
sudo gem install cloud-crowd

     http://cyclecomputing.com
http://wiki.github.com/documentcloud/cloud-crowd
http://www.rightscale.com
Amazon Elastic MapReduce


                                      Amazon EC2 Instances
                                    ...
Protein interactions @ U. Washington



                                EC2!




            Simple Python scripts automat...
BLAT @ U. Penn
     Map 100 million, 100 base paired end reads
     Quad core with 5 GB of RAM would take 16 days



     ...
200 instances
                         60000 structures
                             4 hours
http://bioteam.net/aws
Crossbow: Rapid whole genome SNP analysis


                             Preprocessed reads


                            ...
data storage & distribution
         public & private
http://aws.amazon.com/publicdatasets/
sharing and collaboration
http://www.elasticr.net




            Elastic-R Collaborative Research Environment
software/pipeline distribution
http://www.cloudbiolinux.com/
http://usegalaxy.org/cloud
http://clovr.igs.umaryland.edu/
platforms & services
http://heroku.com
http://chempedia.com/
http://dnanexus.com
NexusDB




                                        NexusDB
    NexusDB                              Storage
     Server  ...
Cloud-based biosemantic apps




http://syapse.com/
<4>
2006 comparison. Large Service vs. Mid Size




Source: James Hamilton
datacenter design efficiency
                         PUE < 1.2 - 1.5*




Source: James Hamilton                      * a...
multiple datacenters



Source: James Hamilton
h/w costs
                    efficiency optimization



Source: James Hamilton
investments in automation
     investments in s/w
investments in special skills
server utilization
infrastructure as opex
 pay as you go/grow
• Boot from EBS
                                                                • AWS Multi Factor Authentication         ...
there is no magic



there is only awesome
Thank	
  you!




deesingh@amazon.com	
  Twi;er:@mndoci	
  
      Presenta?on	
  ideas	
  from	
  James	
  Hamilton,	
  @m...
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Bio-IT World 2010 - Keynote talk
Upcoming SlideShare
Loading in...5
×

Bio-IT World 2010 - Keynote talk

3,023

Published on

My keynote from Bio-IT World 2010

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,023
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "Bio-IT World 2010 - Keynote talk"

  1. 1. There  is  no  magic,  there  is  only  awesome Scien&fic  compu&ng  with  Amazon  Web  Services Deepak  Singh Bio-­‐IT  World  2010
  2. 2. By ~Prescott under a CC-BY-NC license
  3. 3. <1>
  4. 4. data
  5. 5. Image: Wikipedia
  6. 6. Source: http://www.nature.com/news/specials/bigdata/index.html
  7. 7. implications
  8. 8. data management data processing data sharing
  9. 9. <2>
  10. 10. !"#$%&'()*+',-./0) 1%$.#'$$)
  11. 11. !"#$%&'()*+',-./0) 1%$.#'$$) !"##"$% &'()*"((%
  12. 12. massive scale highly available efficient service oriented secure Amazon Infrastructure
  13. 13. infrastructure as a service
  14. 14. undifferentiated heavy lifting
  15. 15. *+,-./-01$23,4567-$89$ :6+;3,$<78$=7>?/@74$ *+,-./-01$23,4567-$89$ :6+;3,A4$BC38+C$<784/074$ !""#$ !""!$ !""%$ !""&$ !""'$ !""($ !"")$ !""D$
  16. 16. Amazon S3 Momentum 102 Billion Peak Requests: 40 Billion 14 Billion 2.9 Billion !"#$%%&# !"#$%%'# !"#$%%(# !"#$%%)# Total Number of Objects Stored in Amazon S3
  17. 17. Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  18. 18. Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  19. 19. Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  20. 20. Your Custom Applications and Services Tools Isolated Networks Monitoring Management AWS Toolkit for Eclipse Amazon Virtual Private Amazon CloudWatch AWS Management Console AWS SDK for .NET Cloud Messaging Amazon Simple Payments On-Demand Parallel Processing Content Delivery Queue Service (SQS) Amazon Flexible Workforce Amazon Elastic Amazon CloudFront Amazon Simple Payments Service Amazon Mechanical MapReduce Notification Service (FPS) Turk (SNS) Compute Storage Amazon Elastic Compute Database Amazon Simple Amazon RDS and Cloud (EC2) Storage Service (S3) - Elastic Load Balancing SimpleDB - AWS Import/Export - Auto Scaling
  21. 21. scalability
  22. 22. > 1PB of data in S3
  23. 23. elasticity
  24. 24. 3000 CPU’s for one firm’s risk management application 3444JJ' !"#$%&'()'*+,'-./01.2%/' 344'+567/'(.' 8%%9%.:/' 344'JJ' I%:.%/:1=' ;<"&/:1=' A&B:1=' C10"&:1=' C".:1=' E(.:1=' ;"%/:1=' >?,,?,44@' >?,3?,44@' >?,>?,44@' >?,H?,44@' >?,D?,44@' >?,F?,44@' >?,G?,44@'
  25. 25. highly available
  26. 26. “Everything fails, all the time” -- Werner Vogels
  27. 27. 2-4% of servers & 1-5% of disk drives will die annually Source: Jeff Dean, LADIS 2009
  28. 28. human errors
  29. 29. human errors ~20% admin issues have unintended consequences Source: James Hamilton
  30. 30. scalable & available
  31. 31. assume sw/hw failure
  32. 32. design apps to be resilient
  33. 33. automation & alarming
  34. 34. Image: Chris Dagdigian
  35. 35. US East Region !"#$%&'()*+ T T Availability Availability Zone A Zone B Availability Availability T Zone C Zone D
  36. 36. elastic load balancing CloudWatch auto scaling SNS SQS elastic IP elastic block store
  37. 37. "#$%&!'()*+#,$! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!( + ( +! +! -((./!'()*+#,$! ).#,$!0)/)/.! !"#$%"&&'%()( !"#$%"&&'%(*( !"#$%"&&'%(!(
  38. 38. cost effective pay as you go economies of scale choices in pricing
  39. 39. on-demand instances reserved instances spot instances
  40. 40. on-demand instances !"#$"%$#&'$(&$ %)"*+,($")$-./.0123&'*$ 45%)$56&*)$7"!"78)#$
  41. 41. reserved instances !"#$%&$'(")*$'++$ !"#+($"&+(,-).$/"0*$ .%,(,)*++1$/,&,/2*3$
  42. 42. spot instances !"#$%&$'&'()#$*+,+*"-.$ ,/"*)$!+()#$%&$(',,0.$+&#$#)1+&#$ .%'$,+.$1+/2)-$,/"*)$ "&(-+&*)($*+&$#"(+,,)+/$
  43. 43. http://cloudexchange.com
  44. 44. physical is free network is easy rest can be added
  45. 45. Customer 1 Customer 2 !" Customer n Hypervisor Virtual Interfaces Customer 1 Security Groups Customer 2 Security Groups ! Customer n Security Groups Firewall Physical Interfaces
  46. 46. A MAZON   V PC   A RCHITECTURE Customer’s isolated AWS resources 10 .32 . 2. 0 /24 Subnets 10 10 .32 .32 .1.0 . 3. 0 /24 /24 VPN Gateway Amazon Web Services Cloud Secure VPN Connection over the Internet External Your Network Customers
  47. 47. http://aws.amazon.com/security
  48. 48. <3>
  49. 49. AWS + science
  50. 50. data management
  51. 51. Biomarker Warehouse pre-clinical, clinical, 3rd party data and publications ;<./5'=>?6@' !)*(%"&&' 23,3415'61789:1' !#%&$(%&&&' +,'-./01' !"#$%"&&' 6178170' 6A.7341' B817-135' Estimated cost: 10 TB warehouse over 3 years
  52. 52. data processing
  53. 53. http://cyclecomputing.com
  54. 54. sudo gem install cloud-crowd http://cyclecomputing.com http://wiki.github.com/documentcloud/cloud-crowd
  55. 55. http://www.rightscale.com
  56. 56. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket bucket Amazon S3
  57. 57. Protein interactions @ U. Washington EC2! Simple Python scripts automate the management of 1000s of simultaneous experiments using the EC2 API http://faculty.washington.edu/danielt/ Source: Ed Lazowska
  58. 58. BLAT @ U. Penn Map 100 million, 100 base paired end reads Quad core with 5 GB of RAM would take 16 days :..2#>*120-#D#1+*@01# $%# C*120-# !67>(2#2A0#:;<=#9.7# E,+.*?#(',621# !+*@0# &'()*+#,-./011# 1,+(21#3*12*#4+05# F.G'+.*?#-016+21# !6710860'2#9.71# !+*@0# :;<=#1>*++0-#4+01# !"# *'?#1*@0#0*/A# -016+2#*1#(2#B.01# !+*@0# !+*@0# 30 high-memory instances; 32 hours; $195 Source: Angel Pizzaro/John Hogenesch
  59. 59. 200 instances 60000 structures 4 hours http://bioteam.net/aws
  60. 60. Crossbow: Rapid whole genome SNP analysis Preprocessed reads Map: Bowtie Sort: Bin and partition Reduce: SoapSNP Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.
  61. 61. data storage & distribution public & private
  62. 62. http://aws.amazon.com/publicdatasets/
  63. 63. sharing and collaboration
  64. 64. http://www.elasticr.net Elastic-R Collaborative Research Environment
  65. 65. software/pipeline distribution
  66. 66. http://www.cloudbiolinux.com/
  67. 67. http://usegalaxy.org/cloud
  68. 68. http://clovr.igs.umaryland.edu/
  69. 69. platforms & services
  70. 70. http://heroku.com
  71. 71. http://chempedia.com/
  72. 72. http://dnanexus.com
  73. 73. NexusDB NexusDB NexusDB Storage Server (S3) SSL NexusDB https Client http://www.biodiscovery.com/index/biod-nexusdb-action
  74. 74. Cloud-based biosemantic apps http://syapse.com/
  75. 75. <4>
  76. 76. 2006 comparison. Large Service vs. Mid Size Source: James Hamilton
  77. 77. datacenter design efficiency PUE < 1.2 - 1.5* Source: James Hamilton * average > 2.0 (Source: EPA)
  78. 78. multiple datacenters Source: James Hamilton
  79. 79. h/w costs efficiency optimization Source: James Hamilton
  80. 80. investments in automation investments in s/w investments in special skills
  81. 81. server utilization infrastructure as opex pay as you go/grow
  82. 82. • Boot from EBS • AWS Multi Factor Authentication • US West Region • Virtual Private Cloud private beta • VPC Unlimited Beta • Lower Reserved Instance Pricing • ELB Support in Console • Reserved Instances in EU • Console Support for CloudWatch • CloudFront streaming • Elastic MapReduce • SQS in EU • EC2 Spot Instances • Windows 2008 Support • RDS Launched • Lowered Prices • New SimpleDB Features •  AWS Security Center • High Memory Instances • AWS Economics Center • Console support for Cloudfront • Reduced EC2 Pricing • FPS General Availability • EMR Apache Hive support • EC2 Reserved Instances • Elastic MapReduce in EU • SAS 70 Type II Audit • EC2 with Windows • AWS SDK for .NET • EC2 in EU • CloudFront Private Content • AWS Toolkit for Eclipse • EBS Shared Snapshots • APAC announced • SimpleDB in EU • Monitoring in EU • AWS Import/Export • Auto Scaling in EU • Lower pricing tiers for Cloudfront • Elastic Load Balancing in EU • Monitoring, Auto Scaling, • AWS Management Console • AWS Solutions Provider program and Elastic Load Balancing • CloudFront adds access logging
  83. 83. there is no magic there is only awesome
  84. 84. Thank  you! deesingh@amazon.com  Twi;er:@mndoci   Presenta?on  ideas  from  James  Hamilton,  @mza  and  @lessig

×