Scien&fic	
  Compu&ng	
  with	
  Amazon	
  Web	
  Services
Deepak	
  Singh




NHGRI	
  Cloud	
  Compu&ng	
  Mee&ng,	
  Bal&more,	
  2010
AWS + science = win
scale has implications
data management
data processing
data sharing
Image: Chris Dagdigian
amazon web services
the cloud
has_many :definitions
infrastructure as a service
Your Custom Applications and Services

                                                                          Tools                  Isolated Networks
         Monitoring                    Management
                                                                 AWS Toolkit for Eclipse        Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console
                                                                  AWS Toolkit for .NET                 Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
• Boot from EBS
                                                                • AWS Multi Factor Authentication           • US West Region
                                                                • Virtual Private Cloud private beta        • VPC Unlimited Beta
                                                                • Lower Reserved Instance Pricing           • ELB Support in Console
                           • Reserved Instances in EU
                                                                • Console Support for CloudWatch            • CloudFront streaming
                           • Elastic MapReduce
                           • SQS in EU                                                                      • EC2 Spot Instances
                                                                                                            • Windows 2008 Support
                                                                                    • RDS Launched          • Lowered Prices
       • New SimpleDB Features                •  AWS Security Center                • High Memory Instances • AWS Economics Center
       • FPS General Availability             • Console support for Cloudfront      • Reduced EC2 Pricing
                                                                                    • EMR Apache Hive support




              • EC2 Reserved Instances                   • Elastic MapReduce in EU                     • SAS 70 Type II Audit
              • EC2 with Windows
                                                                                                       • AWS SDK for .NET
              • EC2 in EU
                                                                                                       • CloudFront Private Content
              • AWS Toolkit for Eclipse                                      • EBS Shared Snapshots • APAC announced
                                                                             • SimpleDB in EU
                                                                             • Monitoring in EU
                                          • AWS Import/Export                • Auto Scaling in EU
• Lower pricing tiers for Cloudfront                                         • Elastic Load Balancing in EU
• AWS Management Console                  • Monitoring, Auto Scaling,
                                          and Elastic Load Balancing         • AWS Solutions Provider program
                                          • CloudFront adds access logging
elasticity
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                                                                    344'+567/'(.'
                                                                    8%%9%.:/'




            344'JJ'



                           I%:.%/:1='    ;<"&/:1='     A&B:1='     C10"&:1='    C".:1='      E(.:1='      ;"%/:1='
                           >?,,?,44@'   >?,3?,44@'   >?,>?,44@'   >?,H?,44@'   >?,D?,44@'   >?,F?,44@'   >?,G?,44@'
scale
> 1PB of data in S3
highly availability
Image: Chris Dagdigian
“Everything fails, all the time”
                   -- Werner Vogels
“Things will crash. Deal with it”
                        -- Jeff Dean
2-4% of servers
                                will die annually



Source: Jeff Dean, LADIS 2009
1-5% of disk drives
                                 will die every year



Source: Jeff Dean, LADIS 2009
human errors
human errors
             ~20% admin issues have unintended consequences




Source: James Hamilton
scalable & available
assume sw/hw failure
design apps to be resilient
automation & alarming
US East Region               !"#$%&'()*+


                                T                 T
Availability     Availability
 Zone A           Zone B



Availability     Availability        T
 Zone C           Zone D
elastic load balancing


                           CloudWatch
auto scaling

                              SQS
  elastic IP


                   elastic block store
flexibility
on-demand instances
 reserved instances
   spot instances
some implications
computing platforms
sudo gem install cloud-crowd

     http://cyclecomputing.com
http://wiki.github.com/documentcloud/cloud-crowd
http://www.rightscale.com
Amazon Elastic
                                    MapReduce

                                     Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                    Hadoop                Hadoop     Hadoop
                         Elastic                                                         Elastic
                       MapReduce                                                       MapReduce
                                    Hadoop                Hadoop     Hadoop                        Notify
Web Console, Command
      line tools                    Input                                    output
                                   dataset                                   results



                                        Input	
  S3	
              Output	
  S3	
                   Get Results
   Input Data
                                         bucket                     bucket



                                      Amazon S3
application platforms
http://heroku.com
http://chempedia.com/
Image: O’Reilly Radar
software distribution
http://www.cloudbiolinux.com/
http://bitbucket.org/galaxy/galaxy-central/wiki/Home
data distribution
http://aws.amazon.com/publicdatasets/
to conclude
built for scale
built for availability
shared dataspaces
common namespaces
task-based resources
new software architectures
new computing platforms
Data Platform




App Platform
available today
http://aws.amazon.com/education
Thank	
  you!




deesingh@amazon.com	
  Twi2er:@mndoci	
  
     Presenta7on	
  ideas	
  from	
  James	
  Hamilton,	
  @mza,	
  and	
  @lessig

NHGRI Cloud Computing talk