Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cloud Computing Solutions for GenomicsAcross Geographic, Institutional and Economic Barriers                      Ntinos K...
Workshop Schedule      Morning Session: Background Presentations and Prep11:00 – 11:45    Introduction to Cloud Computing ...
A little bit of background information...● Konstantinos (Ntinos) Krampis, started working at J. Craig Venter Inst.(JCVI) i...
J. Craig Venter Institute (JCVI)    Large-scale genome sequencing and bioinformatics computing●Human Microbiome Project (H...
JCVI: sequencing and computing infrastructure●   core sequencing laboratory: 454, Solexa, HiSeq, IonTorrent on the way●   ...
Low-cost sequencing instruments●    small-factor sequencers available: GS Junior by 454, MiSeq by Illumina●   bacterial, v...
More small laboratories doing genome sequencing    amount of    sequencing                        number of labsacquiring ...
Sequencing instruments shipped with minimal                         computational capacity●Problem 1: sequencing data anal...
Each lab building their own informatics infrastructure ?●small labs need additional funds to buildcomputing clusters●funds...
Large sequencing centers offering bioinformatics analysis services ?●   Bioinformatic Resource Centers (BRC)●bioinformatic...
Solving Problem 1: using high-performance computing                   hardware available on the cloud●cloud computing : hi...
The Amazon EC2 cloud computing service●    a subsidiary company of Amazon.com, rents computing pay-as-you go●   cloud comp...
How does cloud computing work ?●   cloud computing evolved from virtualization technology● operating system, bioinformatic...
How does cloud computing work ?● a VM is uploaded on the cloud         remote Amazon EC2 cloud computing serviceservice; r...
Solving problem 2: pre-installed andconfigured bioinformatics software on cloud             Virtual Machines●Cloud BioLinu...
Cloud BioLinux for Bioinformatics● how the Cloud BioLinux project came to be, what it can offers to smalllabs for genome s...
Creating Cloud Biolinux                               ●   JCVI bioinformatics cloud computing research  tinyurl.com/BioLin...
Research at JCVI with Cloud BioLinux●Eucalyptus private cloud currently installed at JCVI,OpenStack on the way●open-source...
10 min of questions....
Research at JCVI with Cloud BioLinux● bioinformatics data analysis pipelines have complexdependencies: operating system, s...
Research at JCVI with Cloud BioLinux●   Funded by NIAID until 2013●port complex bioinformatic pipelines on theAmazon EC2 c...
Running Cloud BioLinux on the Amazon EC2 cloudAccount on the Amazon EC2 cloud   http://aws.amazon.com/ec2
Launch Cloud BioLinux through the EC2 cloud console        http://tinyurl.com/cloud-biolinux-tutorial
Cloud BioLinux launch wizard: steps 1 & 2                                                 1.   go to the                  ...
Cloud BioLinux launch wizard: step 3                                    3. specify                                    a pa...
remote desktopclient
Distributing Data Analysis Results with Cloud BioLinux
Distributing Data Analysis Results with Cloud BioLinux    Whole System Snapshot Exchange●   how difficult is to share bioi...
Cloud BioLinux: whole system snapshot exchangestorage cost: 0.10$ / GB / month
Cloud BioLinux: whole system snapshot exchangeauthorize access to the VM: public or for certain users   other researchers ...
Cloud BioLinux for Software Developers●   Issue 1: for researchers with sensitive data a public cloud might not be an opti...
Cloud BioLinux for Software Developers●   framework to customize software installed in cloud VM / image●   based on python...
software domains in Cloud BioLinux:Genome sequencing, de novo assembly, annotation, phylogeny, molecular structures, gene ...
Acknowledgments & CreditsBrad Chapman     - development of the Fabric scripts, websiteTim Booth, Mesude Bicak, Dawn Field ...
CHPC Workshop Morning Session
CHPC Workshop Morning Session
CHPC Workshop Morning Session
CHPC Workshop Morning Session
CHPC Workshop Morning Session
CHPC Workshop Morning Session
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Service Environment
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

CHPC Workshop Morning Session

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

CHPC Workshop Morning Session

  1. 1. Cloud Computing Solutions for GenomicsAcross Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/
  2. 2. Workshop Schedule Morning Session: Background Presentations and Prep11:00 – 11:45 Introduction to Cloud Computing for Bioinformatics11:45 – 12:00 Questions and Answers12:00 – 12:30 Using Cloud BioLinux on the Amazon EC2 Cloud12:30 – 13:00 Preparation: install Cloud Virtual Machines on laptops Afternoon Session: Hands on Session14:00 – 14:30 Preparation: install Cloud Virtual Machines on laptops14:30 – 16:00 Bioinformatic Analysis using Cloud BioLinux16:30 – 17:30 Customized Bioinformatics Solutions for Participants
  3. 3. A little bit of background information...● Konstantinos (Ntinos) Krampis, started working at J. Craig Venter Inst.(JCVI) in 2009● Background training in Molecular Biology, PhD in Bioinformatics● Research: cloud and high-performance computing, genome assembly● Projects: Cloud BioLinux (cloudbiolinux.org)● Taught Cloud BioLinux workshop at Univ. of Limpopo last May● Slides available at http://www.slideshare.com/agbiotec● Email me for slides, meeting, questions: kkrampis@jcvi.org
  4. 4. J. Craig Venter Institute (JCVI) Large-scale genome sequencing and bioinformatics computing●Human Microbiome Project (HMP): genomesequencing of microbes living in and on the humanbody● Global Ocean Sampling (GOS) survey: genomesequencing of microbes sampled from oceans aroundthe world
  5. 5. JCVI: sequencing and computing infrastructure● core sequencing laboratory: 454, Solexa, HiSeq, IonTorrent on the way● dedicated bioinformatics department (57 bioinformaticians)● large-scale computations, ~1000 node Sun Grid Engine (SGE) cluster
  6. 6. Low-cost sequencing instruments● small-factor sequencers available: GS Junior by 454, MiSeq by Illumina● bacterial, viral, small fungal genomes, sequencing for variant discovery● sequencing as a standard technique in molecular biology and genetics● RNAseq (instead of microarrays) and ChiPseq (instead of yeast 2-hybrid) http://www.gsjunior.com/ http://www.illumina.com/systems/miseq.ilmn
  7. 7. More small laboratories doing genome sequencing amount of sequencing number of labsacquiring the sequence data is only the first step...
  8. 8. Sequencing instruments shipped with minimal computational capacity●Problem 1: sequencing data analysis requires high-performance and expensivecomputing hardware, for example: genome assembly, BLAST, genome annotation●Problem 2: much of bioinformatics software are difficult to install by biologists,need technical expertise with operating systems, compiling source code etc.
  9. 9. Each lab building their own informatics infrastructure ?●small labs need additional funds to buildcomputing clusters●funds for bioinformaticians and softwaredevelopers to maintain the clusters andsoftware● duplication of effort across labs●sub-optimal utilization of the hardwaredue to small amounts of sequencing
  10. 10. Large sequencing centers offering bioinformatics analysis services ?● Bioinformatic Resource Centers (BRC)●bioinformatic analysis coupled with sequencingof an organism● mostly provide data browsing and few analysistools to the public●cannot serve the bioinformatic needs of everysmall lab acquiring a sequencing instrument●need end-to-end solutions, users submitsequence data and get final annotation
  11. 11. Solving Problem 1: using high-performance computing hardware available on the cloud●cloud computing : high performancecomputers and data storage, remotelyaccessible through the Internet●we are all using the cloud: Gmail,Google Docs, FaceBook; you store andaccess data on a remote computer●cloud computers rented pay-as-you-goby service providers such as AmazonElastic Compute Cloud (EC2)
  12. 12. The Amazon EC2 cloud computing service● a subsidiary company of Amazon.com, rents computing pay-as-you go● cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)● used by companies that need additional computers without investing on hardware● physical locations US East / West regions, EU, Singapore, Japan researchers● democratizes access to computing resources outside of institutional, economic ornational boundaries 750 hours free for new users, sign up here: http://aws.amazon.com/free/ http://aws.amazon.com
  13. 13. How does cloud computing work ?● cloud computing evolved from virtualization technology● operating system, bioinformatics software and data, areinstalled on a Virtual Machine (VM)● VM is emulation of a computer system, in the form of asingle, executable binary file● runs inside a physical computer such as a laptop● why Virtualization: simplify IT maintenance
  14. 14. How does cloud computing work ?● a VM is uploaded on the cloud remote Amazon EC2 cloud computing serviceservice; runs by renting computingcapacity from Amazon EC2 (up to VM VM VM64GB RAM / 8 core computers)● bioinformatics software can beexecuted from anywhere in the worldthrough a desktop computer withInternet access Internet● removes need for local computerclusters at each laboratory● alternatively if you have a clusterlocally it can run on a private cloud local computers
  15. 15. Solving problem 2: pre-installed andconfigured bioinformatics software on cloud Virtual Machines●Cloud BioLinux: a publicly accessible VirtualMachine (VM) on the Amazon EC2 cloud●100+ pre-configured and installed bioinformaticssoftware tools Amazon EC2 cloud●sequence analysis, genome assembly, annotation,phylogeny, molecular modeling, gene expression●a researcher can initiate a practically unlimitednumber of VMs for large-scale data analysis andaccess them using a local computer
  16. 16. Cloud BioLinux for Bioinformatics● how the Cloud BioLinux project came to be, what it can offers to smalllabs for genome sequence analysis●where and how do I run Cloud BioLinux , especially if I am not acomputer expert● besides end-users, bioinformatics developers are provided a frameworkfor modifying and sharing VM configurations and data
  17. 17. Creating Cloud Biolinux ● JCVI bioinformatics cloud computing research tinyurl.com/BioLinux-NEBC ● NEBC BioLinux software repository ● community effort at BOSC 2009 – 11 + ●initially: a VM on Amazon EC2 with the tools copied and installed from the NEBC repository ● now: developers framework for creating customized cloud VMs = ● major contributors:http://www.cloudbiolinux.org
  18. 18. Research at JCVI with Cloud BioLinux●Eucalyptus private cloud currently installed at JCVI,OpenStack on the way●open-source cloud platforms, fully compatible withAmazon EC2 (identical API)●easy to set up on a local computer cluster, comes withUbuntu Linux server●develop VMs in-house with complex bioinformaticspipelines pre-installed and upload to Amazon EC2 forpublic access● free to use on your laptop with Virtualbox
  19. 19. 10 min of questions....
  20. 20. Research at JCVI with Cloud BioLinux● bioinformatics data analysis pipelines have complexdependencies: operating system, software libraries,reference databases etc.● approach: pre-install pipelines and all dependenciesin a single binary VM file using a private cloud●upload VM on Amazon EC2: pipelines ready toexecute, no need to purchase hardware JCVI - GSC
  21. 21. Research at JCVI with Cloud BioLinux● Funded by NIAID until 2013●port complex bioinformatic pipelines on theAmazon EC2 cloud● focus on viral reads-to-annotation data pipelines● benefits to small laboratories that lack resources if you own a cluster: download and run VM on your JCVI - GSC●private Eucalyptus or Openstack cloud
  22. 22. Running Cloud BioLinux on the Amazon EC2 cloudAccount on the Amazon EC2 cloud http://aws.amazon.com/ec2
  23. 23. Launch Cloud BioLinux through the EC2 cloud console http://tinyurl.com/cloud-biolinux-tutorial
  24. 24. Cloud BioLinux launch wizard: steps 1 & 2 1. go to the “Community AMIs” tab, specify the Cloud BioLinux VM identifier (most recent update: cloudbiolinux.org) 2. select computational capacity
  25. 25. Cloud BioLinux launch wizard: step 3 3. specify a password for login to Cloud BioLinux in the “User Data” box 10
  26. 26. remote desktopclient
  27. 27. Distributing Data Analysis Results with Cloud BioLinux
  28. 28. Distributing Data Analysis Results with Cloud BioLinux Whole System Snapshot Exchange● how difficult is to share bioinformatics work on your computer with a collaborator ?● capture the state of the computing system (OS + software), data, analysis results● make VM snapshots: executable binary file, replica of a running VM● distribute a VM snapshot with pre-installed software and data analysis results● collaborators can replicate, re-run, add to your analysis results● a snapshot can be shared directly on the Amazon cloud, downloaded on a privatecloud or run on desktop using virtualization software
  29. 29. Cloud BioLinux: whole system snapshot exchangestorage cost: 0.10$ / GB / month
  30. 30. Cloud BioLinux: whole system snapshot exchangeauthorize access to the VM: public or for certain users other researchers can access the VM with all the software, data, analysis results directly on the cloud
  31. 31. Cloud BioLinux for Software Developers● Issue 1: for researchers with sensitive data a public cloud might not be an option● Problem 1: moving VMs across clouds is not trivial, need low level operations● Issue 2: bioinformatic specializations (ex. sequencing, phylogeny, protein structure)● Problem 2: one VM with many tools to fit all becomes over-sized● Cloud BioLinux VM deployment framework
  32. 32. Cloud BioLinux for Software Developers● framework to customize software installed in cloud VM / image● based on python Fabric automated deployment tool● software components listed in simple text configuration files● edit the files to mix and match software according to your needs● use source code repository to share configuration files for customized VMs● start with a bare-bones VM on Amazon EC2 or Eucalyptus private cloud● Fabric scripts automatically install specified software based on configuration filesFree, available from: https://github.com/chapmanb/cloudbiolinux
  33. 33. software domains in Cloud BioLinux:Genome sequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysishigh-level configuration describing software groups for each group individual bioinformatics tools
  34. 34. Acknowledgments & CreditsBrad Chapman - development of the Fabric scripts, websiteTim Booth, Mesude Bicak, Dawn Field – BioLinux 6.0 developmentEnis Afgan – Cloudman and Cloud BioLinux integrationMembers of the Cloud Biolinux community:http://groups.google.com/group/cloudbiolinuxAnd again our contacts:kkrampis@jcvi.org Thank you !http://www.cloudbiolinux.orghttp://www.slideshare.com/agbiotec

Views

Total views

887

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×