Amazon resources for bioinformatics
               Brad Chapman

  Bioinformatics Interest Group, 18 Oct 2012
Goals


        Automate:
            Reduce steps
            Remove activation energy
            Increase abstraction
        Improve:
            Sharing
            Reproducibility
            Teaching
Installation
Easier installation
No installation
Challenge



     Biology computing platform
     Widely accessible
     Customizable
     Community driven
General cloud frameworks




  http://aws.amazon.com/
Not only Amazon




  http://gigaom.com/cloud/what-google-compute-
  engine-means-for-cloud-computing/
CloudBioLinux



       Amazon image with bioinformatics software and
       libraries
       Automated build framework
       Community eort to maintain and extend
  http://cloudbiolinux.org
CloudMan



      SGE cluster plus automation
      Web interface and monitoring
      Persistence and sharing
      Powers the Galaxy Cloud oering
 http://usecloudman.org/
BioCloudCentral



       Automate setup of Amazon instance
       Launch CloudBioLinux and CloudMan
       Provide easy ssh access, no key pairs
  http://biocloudcentral.org
Galaxy




  http://usegalaxy.org
Acknowledgments


     CloudBioLinux: Ntino Krampis, Tim Booth,
     Dawn Field, Pjotr Prins, John Chilton and
     CloudBioLinux community.
     CloudMan: Enis Afgan, James Taylor
     BioCloudCentral: Enis Afgan, John Chilton,
     Dannon Baker
Documentation




  http://cda.currentprotocols.com/WileyCDA/CPUnit/
  refId-bi1109.html
What we'll do


    1   Sign up for Amazon
    2   Start a CloudBioLinux/CloudMan instance
    3   Add nodes to create a compute cluster
    4   Run variant calling pipeline

  Everything done through the web
Getting started


  Sign up for Amazon Web Services
  http://aws.amzaon.com

  Get security credentials: Access Key and Secret Key
  http://portal.aws.amazon.com/gp/aws/
  securityCredentials
Launch: http://biocloudcentral.org
Ready two minutes later
Login to CloudMan
Shared CloudMan images


        Package a complete analysis environment
               Data
               Customizations
        Sharable with other users
        Share string with NGS analysis platform:
  cm-b53c6f1223f966914df347687f6fc818/shared/2012-07-23--19-23/
Start CloudMan
CloudMan console
CloudMan admin page
CloudMan: managing a cluster
Associated Galaxy instance
Analysis data on shared instance
Graphical variant-calling pipeline
Analysis data linked to pipeline
Congure pipeline
Run pipeline
Shut everything down
What happened


    1   Sign up for Amazon
    2   Start a CloudBioLinux/CloudMan instance
    3   Add nodes to create a compute cluster
    4   Run variant calling pipeline

  Everything done through the web
ssh to the machine



  $ ssh ubuntu@184.73.104.51
  ubuntu@184.73.104.51's password:
  Welcome to Ubuntu 12.04 LTS
  (GNU/Linux 3.2.0-23-virtual x86_64)

  ubuntu@ip-10-72-197-11:~$
NX graphical client: login




  http://www.nomachine.com/download.php
NX graphical client: desktop
Summary


 Use cloud resources to build:
     Machines with standard software
     Cluster management
     Analysis pipelines
     Reproducible, sharable instances
     Web-based interfaces

Amazon resource for bioinformatics