Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloud BioLinux S.Africa


Published on

Published in: Technology, Business
  • Be the first to comment

Cloud BioLinux S.Africa

  1. 1. Cloud BioLinux: pre-configured and on-demand computing for genomics without institutional, geographic or economic boundaries Ntino Krampis, PhD JCVI-NIAID-UL workshop S. Africa 2011
  2. 2. Low-cost sequencing technology <ul><li>A new generation of small-factor, bench-top sequencers
  3. 3. example: GS Junior by 454
  4. 4. sequencing becoming standard in biology and genetics research
  5. 5. besides whole genomes: RNAseq, ChiPseq, and metagenomics </li></ul>1
  6. 6. <ul><li>downstream bioinformatic analysis is required for scientific discovery
  7. 7. Problem 1 : sequence data analysis requires high performance
  8. 8. and expensive computing hardware
  9. 9. Problem 2 : many commonly used bioinformatics tools are difficult to install,
  10. 10. usually available only as source code - need technical expertise </li></ul>Acquiring the sequence data is only the first step 2
  11. 11. <ul><li>cloud computing : high performance computers and data storage, remotely accessible through the Internet
  12. 12. we are all using the cloud: Gmail, Google Docs, Yahoo! Mail, FaceBook; you store and access data on a remote computer
  13. 13. cloud computers rented pay-as-you-go by service providers such as Amazon Elastic Compute Cloud (EC2) </li></ul>Solving problem 1: computational capacity on the cloud 3
  14. 14. Cloud computing with Amazon EC2 Additional services besides computing and storage : <ul><li>a subsidiary company of, pay-as-you go cloud computing
  15. 15. cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)
  16. 16. used by companies that need additional computers without investing on hardware
  17. 17. physical locations US East / West regions, EU, Singapore, Japan r esearchers
  18. 18. work on the closest location, then distribute results world-wide
  19. 19. democratizes access to computing resources outside of institutional, economic or national boundaries </li></ul>750 hours free for new users! : Additional services besides computing and storage : Additional services besides computing and storage : 4
  20. 20. <ul><li>operating system, bioinformatics tools and data, are installed on a Virtual Machine (VM)
  21. 21. a VM is uploaded on the cloud; runs using on-demand computing capacity from the EC2 cloud service
  22. 22. can be accessed world-wide through a desktop / laptop computer with Internet access
  23. 23. removes need for local computing infrastructure at each laboratory </li></ul>How does cloud computing work ? local desktop computers Internet remote Amazon EC2 cloud computing service VM VM VM 5
  24. 24. <ul><li>bioinformatics tools are difficult to install
  25. 25. Cloud BioLinux offers a VM on the cloud with 100+ pre-installed and configured bioinformatics tools
  26. 26. sequence analysis, de novo assembly, annotation, phylogeny, molecular modeling, gene expression
  27. 27. a researcher can initiate a practically unlimited number of VMs for large-scale data analysis </li></ul>Solving problem 2: Cloud BioLinux 6
  28. 28. sign- in to the Amazon EC2 cloud control console Username: [email_address] Password: SAcloud! 7 Starting our tutorial: using the cloud
  29. 29. Launch Cloud BioLinux through the EC2 cloud console Click the Launch Instance button 8
  30. 30. <ul>1. go to the “Community AMIs” tab, specify the Cloud BioLinux identifier ami-6011e409 Click </ul>2. select computational capacity: Large - 2 CPU cores 7.5 GB memory <ul>Click </ul>Cloud BioLinux launch wizard: steps 1 & 2 9
  31. 31. <ul>3. specify a password (“workshop”) for login to Cloud BioLinux in the “User Data” box Click </ul>Cloud BioLinux launch wizard: step 3 10
  32. 32. Cloud BioLinux launch wizard: steps 4 & 5 <ul>4. enter a value to uniquely identify your individual Cloud BioLinux VM Click </ul>5. select “ Proceed without a Key Pair” <ul>Click </ul>11
  33. 33. Cloud BioLinux launch wizard: steps 6 & 7 <ul>6. choose default security group Click </ul><ul>7. Are we all on the final screen ? Click </ul>12
  34. 34. Cloud BioLinux launch status <ul>wizard completes and we return back to the console takes a few minutes to launch, will be in “pending” (yellow) state </ul>13
  35. 35. While waiting for Cloud BioLinux to boot up... 14 <ul><li>public datasets on Amazon EC2:
  36. 36. Genbank and Ensembl databases, 1000 human genomes project, influenza
  37. 37. data hosted for free, users pay only for the computing time used
  38. 38. community program:
  39. 39. advantage: putting the data where computational capacity is available
  40. 40. Amazon EC2 education-research grants: </li></ul>Any questions before we get to the exercises ?
  41. 41. 15 final step <ul>In the console click “Instances” find your unique Cloud BioLinux VM using your name specified in step 4 copy its “Public DNS” (server address / URL on the cloud) </ul>
  42. 42. Connecting remotely to Cloud BioLinux click the NX client icon on your computer's desktop: A. paste the DNS in the “Host” box B. select “Unix”, “Gnome”, remote desktop size C. “ubuntu” is the default user Login “ workshop” is the password we set 16
  43. 43. 17
  44. 44. 18 a. b. c.
  45. 45. 19 two S.aureus strains and one S.carnosus species drag & drop the .fna files on the Cloud BioLinux desktop
  46. 46. 20
  47. 47. 21
  48. 48. 22
  49. 49. 23
  50. 50. 24
  51. 51. 25
  52. 52. 26
  53. 53. 27
  54. 54. 28
  55. 55. 29
  56. 56. 30
  57. 57. save and share the Virtual Machine (VM) containing your analysis results with a collaborator storage costs: 0.10$ / GB / month 31
  58. 58. authorize access to the VM: public or for certain users other researchers can access the VM with all the software, data, analysis results directly on the cloud Cloud BioLinux: whole system snapshot exchange 32
  59. 59. Acknowledgments & Credits Brad Chapman,Tim Booth, Bela Tiwari, Dawn Field – Cloud BioLinux development Deepak Singh and AWS - compute credits on EC2 supporting initial development J. Craig Venter Inst. - sponsorship / time allowed to work on this project D. Gomez, E. Navarro, J. Shao, I. Singh, D. Edwards, M. Stout – JCVI tech innovation Members of the Cloud Biolinux community: Enis Afgan Michael Heuer Richard Holland Mark Jensen Dave Messina Steffen Möller Roman Valls Thank you !