Cloud ntino-krampis


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cloud ntino-krampis

  1. 1. Cloud BioLinux: pre-Configured and on-demandcomputing for genomics independently of institutional, geographic or economic boundaries Ntino Krampis, PhD JCVI-NIAID workshop 2011 S. Africa
  2. 2. Expensive sequencing and large organizations Commodity sequencing and small labs● large sequencing center, multi-million, broad-impact sequencing projects● dedicated bioinformatics department, coordination with other centers● small-factor, bench-top sequencer available: GS Junior by 454● sequencing as a standard technique in basic biology and genetics research● RNAseq and ChiPseq, and each biologist will be tackling a metagenome
  3. 3. Acquiring the sequence data is only the first step● downstream bioinformatics analysis for scientific discovery● many commonly-used bioinformatics tools are difficult to install● usually available only as source code - needs technical expertise● large-scale sequence data analysis requires high performance and expensive computing hardware
  4. 4. Alternative: computational capacity on the cloud● Cloud Computing: large-scale, highperformance computers accessiblethrough the Internet●Example: using Gmail, Google Docs,Yahoo! Mail, FaceBook etc. you store andaccess data on a remote computer●Cloud Computing services - AmazonEC2 ( rent highcomputational and data storage capacityon remote computers
  5. 5. How does Cloud Computing work ? remote Amazon EC2 Cloud Computing serviceoperating system, bioinformatics softwareand data, are installed in a Virtual Machine VM VM VM(VM)a VM is uploaded and executed on a cloudcomputing servicerun a practically unlimited number of VMs Internetfor large-scale sequence data analysisaccess VM on a desktop computer throughthe Internet local desktop computers
  6. 6. Cloud BioLinux● Cloud BioLinux by leverages VM technology and the cloud, offering pre-configured bioinformatics computing● allow setting up a high-performance data analysis environment, without any technical expertise● researchers can perform large-scale data analysis, by simply using a desktop computer with Internet access● accessible without any institutional, economic or national boundaries
  7. 7. Launching Cloud BioLinux1. sign up for an Amazon EC2 cloud account: Also can connect an existing account from the main website for the cloud usage charges. We have an account ready for you: Username: Password: Nhg4|CL0ud!2. using the account credentials sign in to the EC2 cloud console (select EC2 in the dropdown menu below the sign-in button): launch Cloud BioLinux through the cloud console wizard
  8. 8. Launching Cloud BioLinux Click the button :
  9. 9. Launch instance wizard: steps 1 & 2 1. specify the Cloud BioLinux identifier under “Community AMIs” tab 2. computational capacity: memory, processor, CPU cores
  10. 10. Launch instance wizard: step 3 3. specify a password for login for the Cloud BioLinux desktop, under “User Data” box 4. remaining steps: all as default, keep clicking the “Continue” button until the wizard finishes and you are back to the console
  11. 11. Launching Cloud BioLinux back to the console after we completed the wizard Pick a runninginstance, select with your mouse andcopy its “PublicDNS” address (Cloud BioLinux server address on the cloud)
  12. 12. While waiting for Cloud BioLinux to boot up...● examples of NCBI public datasets on EC2● bringing the data to the compute
  13. 13. Final step: connecting remotely to Cloud BioLinux click the NX client icon on your computers desktopA. paste the DNS in the “Host” box B. select “Unix”, “Gnome”, remote desktop size C. “ubuntu” is the default user Login “workshop” is the password we set
  14. 14. What if I want to share myalignments witha collaborator?save your data as a new VM 0.10$ / GB / monthat 15GB, it costs 1.5$ / month
  15. 15. Cloud BioLinux whole system snapshot exchangeshare your analysis results: publicly or only with your collaboratorsauthorized users can access the cloud VM/image with all the software, data, analysis results
  16. 16. Cloud BioLinux and Genomic Standards whole system snapshot exchange start VM / image share perform analysis snapshot researcher Bresearcher A snapshot perform analysis share start VM / image
  17. 17. Acknowledgments & CreditsBrad Chapman - development of the fabric scripts and community organizerTim Booth, Bela Tiwari, Dawn Field – BioLinux 6.0 development and EC2 documentationDeepak Singh and AWS - education grant supporting ISMB / BOSC workshopJustin Johnson – community and sponsorship of cloudbiolinux.comJ. Craig Venter Inst. - time allowed to work on an open-source projectD. Gomez, E. Navarro, J. Shao, I. Singh – JCVI technology innovationMembers of the Cloud Biolinux community:Enis AfganMichael HeuerRichard HollandMark Jensen Thank you !Dave MessinaSteffen MöllerRoman Valls