Your SlideShare is downloading. ×
0
Cloud BioLinux: pre-configured and on-demand  computing for genomics without institutional, geographic or economic boundar...
Low-cost sequencing technology <ul><li>A new generation of  small-factor, bench-top sequencers
example: GS Junior by 454
sequencing becoming standard in biology and genetics research
besides whole genomes: RNAseq, ChiPseq, and  metagenomics </li></ul>1
<ul><li>downstream bioinformatic analysis is required for scientific discovery
Problem 1 : sequence data analysis requires high performance
and expensive computing hardware
Problem 2 :  many commonly used bioinformatics tools are difficult to install,
usually available only as source code - need technical expertise </li></ul>Acquiring the sequence data is only the first s...
<ul><li>cloud computing : high performance computers and data storage, remotely accessible through the Internet
we are all using the cloud: Gmail, Google Docs, Yahoo! Mail, FaceBook; you store and access data on a remote computer
cloud computers rented pay-as-you-go by service providers such as Amazon Elastic Compute Cloud (EC2) </li></ul>Solving pro...
Cloud computing with Amazon EC2 Additional services besides computing and storage : http://aws.amazon.com <ul><li>a subsid...
cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)
used by companies that need additional computers without investing on hardware
physical locations  US East / West regions, EU, Singapore, Japan  r esearchers
work on the closest location, then distribute results world-wide
democratizes access to computing resources outside of institutional, economic or national  boundaries </li></ul>750 hours ...
<ul><li>operating system, bioinformatics tools and data, are installed on a Virtual Machine (VM)
a VM is uploaded on the cloud; runs using on-demand computing capacity from the  EC2  cloud service
can be accessed world-wide through a desktop / laptop computer with Internet access
removes need for local computing infrastructure at each laboratory  </li></ul>How does cloud computing work ? local deskto...
<ul><li>bioinformatics tools are difficult to install
Cloud BioLinux offers a VM on the cloud with 100+ pre-installed and configured bioinformatics tools
sequence analysis,  de novo  assembly, annotation, phylogeny, molecular modeling, gene expression
a researcher can initiate a practically unlimited number of VMs for large-scale data analysis  </li></ul>Solving problem 2...
sign- in to the Amazon  EC2  cloud control console http://aws.amazon.com/console Username:  [email_address] Password:  SAc...
Launch Cloud BioLinux through the EC2 cloud console Click the Launch Instance button 8
<ul>1.   go to the  “Community AMIs” tab, specify the Cloud BioLinux identifier ami-6011e409 Click </ul>2.   select comput...
<ul>3.  specify  a password (“workshop”) for login to Cloud BioLinux in the “User Data” box Click </ul>Cloud BioLinux laun...
Cloud BioLinux launch wizard: steps 4 & 5  <ul>4.   enter a value to uniquely identify  your individual Cloud BioLinux VM ...
Cloud BioLinux launch wizard: steps 6 & 7  <ul>6.   choose default  security group Click </ul><ul>7.   Are we all on the f...
Cloud BioLinux launch status <ul>wizard completes and we return back to the console  takes a few minutes to launch, will b...
Upcoming SlideShare
Loading in...5
×

Cloud BioLinux S.Africa

1,177

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,177
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Cloud BioLinux S.Africa"

  1. 1. Cloud BioLinux: pre-configured and on-demand computing for genomics without institutional, geographic or economic boundaries Ntino Krampis, PhD JCVI-NIAID-UL workshop S. Africa 2011
  2. 2. Low-cost sequencing technology <ul><li>A new generation of small-factor, bench-top sequencers
  3. 3. example: GS Junior by 454
  4. 4. sequencing becoming standard in biology and genetics research
  5. 5. besides whole genomes: RNAseq, ChiPseq, and metagenomics </li></ul>1
  6. 6. <ul><li>downstream bioinformatic analysis is required for scientific discovery
  7. 7. Problem 1 : sequence data analysis requires high performance
  8. 8. and expensive computing hardware
  9. 9. Problem 2 : many commonly used bioinformatics tools are difficult to install,
  10. 10. usually available only as source code - need technical expertise </li></ul>Acquiring the sequence data is only the first step 2
  11. 11. <ul><li>cloud computing : high performance computers and data storage, remotely accessible through the Internet
  12. 12. we are all using the cloud: Gmail, Google Docs, Yahoo! Mail, FaceBook; you store and access data on a remote computer
  13. 13. cloud computers rented pay-as-you-go by service providers such as Amazon Elastic Compute Cloud (EC2) </li></ul>Solving problem 1: computational capacity on the cloud 3
  14. 14. Cloud computing with Amazon EC2 Additional services besides computing and storage : http://aws.amazon.com <ul><li>a subsidiary company of Amazon.com, pay-as-you go cloud computing
  15. 15. cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)
  16. 16. used by companies that need additional computers without investing on hardware
  17. 17. physical locations US East / West regions, EU, Singapore, Japan r esearchers
  18. 18. work on the closest location, then distribute results world-wide
  19. 19. democratizes access to computing resources outside of institutional, economic or national boundaries </li></ul>750 hours free for new users! : http://aws.amazon.com/free/ Additional services besides computing and storage : http://aws.amazon.com Additional services besides computing and storage : http://aws.amazon.com 4
  20. 20. <ul><li>operating system, bioinformatics tools and data, are installed on a Virtual Machine (VM)
  21. 21. a VM is uploaded on the cloud; runs using on-demand computing capacity from the EC2 cloud service
  22. 22. can be accessed world-wide through a desktop / laptop computer with Internet access
  23. 23. removes need for local computing infrastructure at each laboratory </li></ul>How does cloud computing work ? local desktop computers Internet remote Amazon EC2 cloud computing service VM VM VM 5
  24. 24. <ul><li>bioinformatics tools are difficult to install
  25. 25. Cloud BioLinux offers a VM on the cloud with 100+ pre-installed and configured bioinformatics tools
  26. 26. sequence analysis, de novo assembly, annotation, phylogeny, molecular modeling, gene expression
  27. 27. a researcher can initiate a practically unlimited number of VMs for large-scale data analysis </li></ul>Solving problem 2: Cloud BioLinux 6
  28. 28. sign- in to the Amazon EC2 cloud control console http://aws.amazon.com/console Username: [email_address] Password: SAcloud! 7 Starting our tutorial: using the cloud
  29. 29. Launch Cloud BioLinux through the EC2 cloud console Click the Launch Instance button 8
  30. 30. <ul>1. go to the “Community AMIs” tab, specify the Cloud BioLinux identifier ami-6011e409 Click </ul>2. select computational capacity: Large - 2 CPU cores 7.5 GB memory <ul>Click </ul>Cloud BioLinux launch wizard: steps 1 & 2 9
  31. 31. <ul>3. specify a password (“workshop”) for login to Cloud BioLinux in the “User Data” box Click </ul>Cloud BioLinux launch wizard: step 3 10
  32. 32. Cloud BioLinux launch wizard: steps 4 & 5 <ul>4. enter a value to uniquely identify your individual Cloud BioLinux VM Click </ul>5. select “ Proceed without a Key Pair” <ul>Click </ul>11
  33. 33. Cloud BioLinux launch wizard: steps 6 & 7 <ul>6. choose default security group Click </ul><ul>7. Are we all on the final screen ? Click </ul>12
  34. 34. Cloud BioLinux launch status <ul>wizard completes and we return back to the console takes a few minutes to launch, will be in “pending” (yellow) state </ul>13
  35. 35. While waiting for Cloud BioLinux to boot up... 14 <ul><li>public datasets on Amazon EC2: http://aws.amazon.com/publicdatasets
  36. 36. Genbank and Ensembl databases, 1000 human genomes project, influenza
  37. 37. data hosted for free, users pay only for the computing time used
  38. 38. community program: http://aws.amazon.com/datasets/submit
  39. 39. advantage: putting the data where computational capacity is available
  40. 40. Amazon EC2 education-research grants: http://aws.amazon.com/education/ </li></ul>Any questions before we get to the exercises ?
  41. 41. 15 final step <ul>In the console click “Instances” find your unique Cloud BioLinux VM using your name specified in step 4 copy its “Public DNS” (server address / URL on the cloud) </ul>
  42. 42. Connecting remotely to Cloud BioLinux click the NX client icon on your computer's desktop: A. paste the DNS in the “Host” box B. select “Unix”, “Gnome”, remote desktop size C. “ubuntu” is the default user Login “ workshop” is the password we set 16
  43. 43. 17
  44. 44. 18 a. b. c.
  45. 45. 19 two S.aureus strains and one S.carnosus species drag & drop the .fna files on the Cloud BioLinux desktop
  46. 46. 20
  47. 47. 21
  48. 48. 22
  49. 49. 23
  50. 50. 24
  51. 51. 25
  52. 52. 26
  53. 53. 27
  54. 54. 28
  55. 55. 29
  56. 56. 30
  57. 57. save and share the Virtual Machine (VM) containing your analysis results with a collaborator storage costs: 0.10$ / GB / month 31
  58. 58. authorize access to the VM: public or for certain users other researchers can access the VM with all the software, data, analysis results directly on the cloud Cloud BioLinux: whole system snapshot exchange 32
  59. 59. Acknowledgments & Credits Brad Chapman,Tim Booth, Bela Tiwari, Dawn Field – Cloud BioLinux development Deepak Singh and AWS - compute credits on EC2 supporting initial development J. Craig Venter Inst. - sponsorship / time allowed to work on this project D. Gomez, E. Navarro, J. Shao, I. Singh, D. Edwards, M. Stout – JCVI tech innovation Members of the Cloud Biolinux community: Enis Afgan Michael Heuer Richard Holland Mark Jensen Dave Messina Steffen Möller Roman Valls Thank you !
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×