SlideShare a Scribd company logo
Cloud Computing Solutions for Genomics
Across Geographic, Institutional and Economic Barriers



                      Ntinos Krampis
                       Asst. Professor
                  J. Craig Venter Institute
                     kkrampis@jcvi.org

         http://www.jcvi.org/cms/about/bios/kkrampis/
Workshop Schedule

      Morning Session: Background Presentations and Prep

11:00 – 11:45    Introduction to Cloud Computing for Bioinformatics
11:45 – 12:00    Questions and Answers
12:00 – 12:30   Using Cloud BioLinux on the Amazon EC2 Cloud
12:30 – 13:00   Preparation: install Cloud Virtual Machines on laptops

                Afternoon Session: Hands on Session

14:00 – 14:30 Preparation: install Cloud Virtual Machines on laptops
14:30 – 16:00 Bioinformatic Analysis using Cloud BioLinux
16:30 – 17:30 Customized Bioinformatics Solutions for Participants
A little bit of background information...

● Konstantinos (Ntinos) Krampis, started working at J. Craig Venter Inst.
(JCVI) in 2009

●   Background training in Molecular Biology, PhD in Bioinformatics

●   Research: cloud and high-performance computing, genome assembly

●   Projects: Cloud BioLinux (cloudbiolinux.org)

●   Taught Cloud BioLinux workshop at Univ. of Limpopo last May

●   Slides available at http://www.slideshare.com/agbiotec

●
    Email me for slides, meeting, questions:   kkrampis@jcvi.org
J. Craig Venter Institute (JCVI)
    Large-scale genome sequencing and bioinformatics computing

●Human Microbiome Project (HMP): genome
sequencing of microbes living in and on the human
body

● Global Ocean Sampling (GOS) survey: genome
sequencing of microbes sampled from oceans around
the world
JCVI: sequencing and computing infrastructure

●   core sequencing laboratory: 454, Solexa, HiSeq, IonTorrent on the way

●   dedicated bioinformatics department (57 bioinformaticians)

●   large-scale computations, ~1000 node Sun Grid Engine (SGE) cluster
Low-cost sequencing instruments

●
    small-factor sequencers available: GS Junior by 454, MiSeq by Illumina

●   bacterial, viral, small fungal genomes, sequencing for variant discovery

●   sequencing as a standard technique in molecular biology and genetics

●   RNAseq (instead of microarrays) and ChiPseq (instead of yeast 2-hybrid)




    http://www.gsjunior.com/     http://www.illumina.com/systems/miseq.ilmn
More small laboratories doing genome sequencing




    amount of
    sequencing



                        number of labs




acquiring the sequence data is only the first step...
Sequencing instruments shipped with minimal
                         computational capacity

●Problem 1: sequencing data analysis requires high-performance and expensive
computing hardware, for example: genome assembly, BLAST, genome annotation

●Problem 2: much of bioinformatics software are difficult to install by biologists,
need technical expertise with operating systems, compiling source code etc.
Each lab building their own informatics infrastructure ?


●small labs need additional funds to build
computing clusters

●funds for bioinformaticians and software
developers to maintain the clusters and
software

●   duplication of effort across labs

●sub-optimal utilization of the hardware
due to small amounts of sequencing
Large sequencing centers offering bioinformatics analysis services ?



●   Bioinformatic Resource Centers (BRC)

●bioinformatic analysis coupled with sequencing
of an organism

● mostly provide data browsing and few analysis
tools to the public

●cannot serve the bioinformatic needs of every
small lab acquiring a sequencing instrument

●need end-to-end solutions, users submit
sequence data and get final annotation
Solving Problem 1: using high-performance computing
                   hardware available on the cloud

●cloud computing : high performance
computers and data storage, remotely
accessible through the Internet

●we are all using the cloud: Gmail,
Google Docs, FaceBook; you store and
access data on a remote computer

●cloud computers rented pay-as-you-go
by service providers such as Amazon
Elastic Compute Cloud (EC2)
The Amazon EC2 cloud computing service
●
    a subsidiary company of Amazon.com, rents computing pay-as-you go

●   cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors)

●   used by companies that need additional computers without investing on hardware

●   physical locations US East / West regions, EU, Singapore, Japan researchers

● democratizes access to computing resources outside of institutional, economic or
national boundaries




                                   750 hours free for new users, sign up here:
                                   http://aws.amazon.com/free/
    http://aws.amazon.com
How does cloud computing work ?


●   cloud computing evolved from virtualization technology

● operating system, bioinformatics software and data, are
installed on a Virtual Machine (VM)

● VM is emulation of a computer system, in the form of a
single, executable binary file

●   runs inside a physical computer such as a laptop

●   why Virtualization: simplify IT maintenance
How does cloud computing work ?

● a VM is uploaded on the cloud         remote Amazon EC2 cloud computing service
service; runs by renting computing
capacity from Amazon EC2 (up to                                     VM              VM
                                                VM
64GB RAM / 8 core computers)

● bioinformatics software can be
executed from anywhere in the world
through a desktop computer with
Internet access                                                   Internet

● removes need for local computer
clusters at each laboratory

● alternatively if you have a cluster
locally it can run on a private cloud
                                                local computers
Solving problem 2: pre-installed and
configured bioinformatics software on cloud
             Virtual Machines

●Cloud BioLinux: a publicly accessible Virtual
Machine (VM) on the Amazon EC2 cloud

●100+ pre-configured and installed bioinformatics
software tools                                       Amazon EC2 cloud
●sequence analysis, genome assembly, annotation,
phylogeny, molecular modeling, gene expression

●a researcher can initiate a practically unlimited
number of VMs for large-scale data analysis and
access them using a local computer
Cloud BioLinux for Bioinformatics

● how the Cloud BioLinux project came to be, what it can offers to small
labs for genome sequence analysis

●where and how do I run Cloud BioLinux , especially if I am not a
computer expert

● besides end-users, bioinformatics developers are provided a framework
for modifying and sharing VM configurations and data
Creating Cloud Biolinux

                               ●   JCVI bioinformatics cloud computing research

  tinyurl.com/BioLinux-NEBC
                               ●   NEBC BioLinux software repository
                               ●   community effort at BOSC 2009 – 11
            +
                               ●initially: a VM on Amazon EC2 with the tools copied
                               and installed from the NEBC repository

                               ● now: developer's framework for creating customized
                               cloud VMs
            =
                               ●   major contributors:




http://www.cloudbiolinux.org
Research at JCVI with Cloud BioLinux


●Eucalyptus private cloud currently installed at JCVI,
OpenStack on the way

●open-source cloud platforms, fully compatible with
Amazon EC2 (identical API)

●easy to set up on a local computer cluster, comes with
Ubuntu Linux server

●develop VMs in-house with complex bioinformatics
pipelines pre-installed and upload to Amazon EC2 for
public access

●   free to use on your laptop with Virtualbox
10 min of questions....
Research at JCVI with Cloud BioLinux


● bioinformatics data analysis pipelines have complex
dependencies: operating system, software libraries,
reference databases etc.

● approach: pre-install pipelines and all dependencies
in a single binary VM file using a private cloud

●upload VM on Amazon EC2: pipelines ready to
execute, no need to purchase hardware
                                                         JCVI - GSC
Research at JCVI with Cloud BioLinux


●   Funded by NIAID until 2013

●port complex bioinformatic pipelines on the
Amazon EC2 cloud

●   focus on viral reads-to-annotation data pipelines

●   benefits to small laboratories that lack resources

 if you own a cluster: download and run VM on your       JCVI - GSC
●

private Eucalyptus or Openstack cloud
Running Cloud BioLinux on the Amazon EC2 cloud




Account on the Amazon EC2 cloud   http://aws.amazon.com/ec2
Launch Cloud BioLinux through the EC2 cloud console




        http://tinyurl.com/cloud-biolinux-tutorial
Cloud BioLinux launch wizard: steps 1 & 2

                                                 1.   go to the
                                             “Community AMIs”
                                            tab, specify the Cloud
                                                BioLinux VM
                                                   identifier
                                             (most recent update:
                                              cloudbiolinux.org)



                                                  2. select
                                               computational
                                                 capacity
Cloud BioLinux launch wizard: step 3


                                    3. specify
                                    a password
                                 for login to Cloud
                                  BioLinux in the
                                 “User Data” box




                                                      10
remote desktop
client
Distributing Data Analysis Results with Cloud BioLinux
Distributing Data Analysis Results with Cloud BioLinux

    Whole System Snapshot Exchange


●   how difficult is to share bioinformatics work on your computer with a collaborator ?
●   capture the state of the computing system (OS + software), data, analysis results
●   make VM snapshots: executable binary file, replica of a running VM

●   distribute a VM snapshot with pre-installed software and data analysis results

●   collaborators can replicate, re-run, add to your analysis results

● a snapshot can be shared directly on the Amazon cloud, downloaded on a private
cloud or run on desktop using virtualization software
Cloud BioLinux: whole system snapshot exchange




storage cost: 0.10$ / GB / month
Cloud BioLinux: whole system snapshot exchange
authorize access to the VM: public or for certain users

   other researchers can access the VM with all the
 software, data, analysis results directly on the cloud
Cloud BioLinux for Software Developers


●   Issue 1: for researchers with sensitive data a public cloud might not be an option
●   Problem 1: moving VMs across clouds is not trivial, need low level operations


●   Issue 2: bioinformatic specializations (ex. sequencing, phylogeny, protein structure)
●   Problem 2: one VM with many tools to fit all becomes over-sized


●   Cloud BioLinux VM deployment framework
Cloud BioLinux for Software Developers


●   framework to customize software installed in cloud VM / image
●   based on python Fabric automated deployment tool
●   software components listed in simple text configuration files
●   edit the files to mix and match software according to your needs
●   use source code repository to share configuration files for customized VMs
●   start with a bare-bones VM on Amazon EC2 or Eucalyptus private cloud
●   Fabric scripts automatically install specified software based on configuration files




Free, available from: https://github.com/chapmanb/cloudbiolinux
software domains in Cloud BioLinux:

Genome sequencing, de novo assembly, annotation,
 phylogeny, molecular structures, gene expression
                    analysis

high-level configuration describing software groups
   for each group individual bioinformatics tools
Acknowledgments & Credits

Brad Chapman     - development of the Fabric scripts, website
Tim Booth, Mesude Bicak, Dawn Field – BioLinux 6.0 development
Enis Afgan – Cloudman and Cloud BioLinux integration


Members of the Cloud Biolinux community:
http://groups.google.com/group/cloudbiolinux


And again our contacts:
kkrampis@jcvi.org                                 Thank you !
http://www.cloudbiolinux.org
http://www.slideshare.com/agbiotec

More Related Content

What's hot

Docker training
Docker trainingDocker training
Docker training
Kiran Kumar
 
Microservices, Containers and Docker
Microservices, Containers and DockerMicroservices, Containers and Docker
Microservices, Containers and Docker
Ioannis Papapanagiotou
 
Visualizing a cloud using eucalyptus and xen
Visualizing a cloud using eucalyptus and xenVisualizing a cloud using eucalyptus and xen
Visualizing a cloud using eucalyptus and xen
A. Roy
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes Cloudscaling
Rishidot Research
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and Haduzilla
Jazz Yao-Tsung Wang
 
Build FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERBuild FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWER
Indrajit Poddar
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
Wee Hyong Tok
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
buildacloud
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
Chafik Belhaoues
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
Mihai Criveti
 
Hyper v r2 deep dive
Hyper v r2 deep diveHyper v r2 deep dive
Hyper v r2 deep dive
Concentrated Technology
 
Cloud orchestration risks
Cloud orchestration risksCloud orchestration risks
Cloud orchestration risks
Glib Pakharenko
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
bcantrill
 
Docker
DockerDocker
Introduction to automated environment management with Docker Containers - for...
Introduction to automated environment management with Docker Containers - for...Introduction to automated environment management with Docker Containers - for...
Introduction to automated environment management with Docker Containers - for...
Lucas Jellema
 
Docker and kernel security
Docker and kernel securityDocker and kernel security
Docker and kernel security
smart_bit
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
Casey Bisson
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.
Timothy St. Clair
 
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
IAESIJEECS
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?
Jérôme Petazzoni
 

What's hot (20)

Docker training
Docker trainingDocker training
Docker training
 
Microservices, Containers and Docker
Microservices, Containers and DockerMicroservices, Containers and Docker
Microservices, Containers and Docker
 
Visualizing a cloud using eucalyptus and xen
Visualizing a cloud using eucalyptus and xenVisualizing a cloud using eucalyptus and xen
Visualizing a cloud using eucalyptus and xen
 
Rishidot research briefing notes Cloudscaling
Rishidot research briefing notes   CloudscalingRishidot research briefing notes   Cloudscaling
Rishidot research briefing notes Cloudscaling
 
Build Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and HaduzillaBuild Your Private Cloud with Ezilla and Haduzilla
Build Your Private Cloud with Ezilla and Haduzilla
 
Build FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWERBuild FAST Learning Apps with Docker and OpenPOWER
Build FAST Learning Apps with Docker and OpenPOWER
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Automating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with PuppetAutomating Your CloudStack Cloud with Puppet
Automating Your CloudStack Cloud with Puppet
 
The building blocks of docker.
The building blocks of docker.The building blocks of docker.
The building blocks of docker.
 
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
AnsibleFest 2021 - DevSecOps with Ansible, OpenShift Virtualization, Packer a...
 
Hyper v r2 deep dive
Hyper v r2 deep diveHyper v r2 deep dive
Hyper v r2 deep dive
 
Cloud orchestration risks
Cloud orchestration risksCloud orchestration risks
Cloud orchestration risks
 
node.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicornnode.js in production: Reflections on three years of riding the unicorn
node.js in production: Reflections on three years of riding the unicorn
 
Docker
DockerDocker
Docker
 
Introduction to automated environment management with Docker Containers - for...
Introduction to automated environment management with Docker Containers - for...Introduction to automated environment management with Docker Containers - for...
Introduction to automated environment management with Docker Containers - for...
 
Docker and kernel security
Docker and kernel securityDocker and kernel security
Docker and kernel security
 
The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)The Lies We Tell Our Code (#seascale 2015 04-22)
The Lies We Tell Our Code (#seascale 2015 04-22)
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.
 
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
24 23 jun17 2may17 16231 ijeecs latest_version (1) edit septian
 
Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?Docker, Linux Containers, and Security: Does It Add Up?
Docker, Linux Containers, and Security: Does It Add Up?
 

Viewers also liked

Service Environment
Service EnvironmentService Environment
Service Environment
lianti
 
Responders and Assessments Presentation
Responders  and  Assessments PresentationResponders  and  Assessments Presentation
Responders and Assessments Presentation
frewsmhuffman
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
Michel Dumontier
 
Abu Dhabi Banks First Update
Abu Dhabi Banks First UpdateAbu Dhabi Banks First Update
Abu Dhabi Banks First Update
arsalanmustafa
 
AD skater
AD skaterAD skater
AD skater
nmtorres86
 
Monica Cabbler - Profile
Monica Cabbler - ProfileMonica Cabbler - Profile
Monica Cabbler - Profile
Monica Cabbler
 
IPCC2010-1
IPCC2010-1IPCC2010-1
IPCC2010-1
Debopriyo Roy
 
Harmony 2011: Formalization of SBML models as OWL ontologies
Harmony 2011: Formalization of SBML models as OWL ontologiesHarmony 2011: Formalization of SBML models as OWL ontologies
Harmony 2011: Formalization of SBML models as OWL ontologies
Michel Dumontier
 
Eportfolio
EportfolioEportfolio
Eportfolio
Debopriyo Roy
 
Long Term Data Storage 2007
Long Term Data Storage 2007Long Term Data Storage 2007
Long Term Data Storage 2007
Andrei Khurshudov
 
Cio Summit 2008
Cio Summit 2008Cio Summit 2008
Cio Summit 2008
Tomas Gregorio MBA
 
Socialmediastenden
SocialmediastendenSocialmediastenden
Socialmediastenden
Sascha Funk
 
SecurVoice Call Recording
SecurVoice Call RecordingSecurVoice Call Recording
SecurVoice Call Recording
Crawford6003
 
2011 CANARIE User's Forum
2011 CANARIE User's Forum2011 CANARIE User's Forum
2011 CANARIE User's Forum
Michel Dumontier
 
Social media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral electionSocial media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral election
Patrick Gladney
 
Dokumentacia pechatni materiali
Dokumentacia pechatni materialiDokumentacia pechatni materiali
Dokumentacia pechatni materialiNural Tataoglu
 
Cinch Inch Loss Plan Testimonials
Cinch Inch Loss Plan TestimonialsCinch Inch Loss Plan Testimonials
Cinch Inch Loss Plan Testimonials
Johnine Bailey
 
Branding and design
Branding and designBranding and design
Branding and design
raneez
 
Wastewater 101: Decentralized approach, community process, and options
Wastewater 101: Decentralized approach, community process, and optionsWastewater 101: Decentralized approach, community process, and options
Wastewater 101: Decentralized approach, community process, and options
dmalchow
 

Viewers also liked (20)

Service Environment
Service EnvironmentService Environment
Service Environment
 
Responders and Assessments Presentation
Responders  and  Assessments PresentationResponders  and  Assessments Presentation
Responders and Assessments Presentation
 
2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
 
Abu Dhabi Banks First Update
Abu Dhabi Banks First UpdateAbu Dhabi Banks First Update
Abu Dhabi Banks First Update
 
AD skater
AD skaterAD skater
AD skater
 
Monica Cabbler - Profile
Monica Cabbler - ProfileMonica Cabbler - Profile
Monica Cabbler - Profile
 
Grupa Onet(2)
Grupa Onet(2)Grupa Onet(2)
Grupa Onet(2)
 
IPCC2010-1
IPCC2010-1IPCC2010-1
IPCC2010-1
 
Harmony 2011: Formalization of SBML models as OWL ontologies
Harmony 2011: Formalization of SBML models as OWL ontologiesHarmony 2011: Formalization of SBML models as OWL ontologies
Harmony 2011: Formalization of SBML models as OWL ontologies
 
Eportfolio
EportfolioEportfolio
Eportfolio
 
Long Term Data Storage 2007
Long Term Data Storage 2007Long Term Data Storage 2007
Long Term Data Storage 2007
 
Cio Summit 2008
Cio Summit 2008Cio Summit 2008
Cio Summit 2008
 
Socialmediastenden
SocialmediastendenSocialmediastenden
Socialmediastenden
 
SecurVoice Call Recording
SecurVoice Call RecordingSecurVoice Call Recording
SecurVoice Call Recording
 
2011 CANARIE User's Forum
2011 CANARIE User's Forum2011 CANARIE User's Forum
2011 CANARIE User's Forum
 
Social media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral electionSocial media analysis for toronto 2010 mayoral election
Social media analysis for toronto 2010 mayoral election
 
Dokumentacia pechatni materiali
Dokumentacia pechatni materialiDokumentacia pechatni materiali
Dokumentacia pechatni materiali
 
Cinch Inch Loss Plan Testimonials
Cinch Inch Loss Plan TestimonialsCinch Inch Loss Plan Testimonials
Cinch Inch Loss Plan Testimonials
 
Branding and design
Branding and designBranding and design
Branding and design
 
Wastewater 101: Decentralized approach, community process, and options
Wastewater 101: Decentralized approach, community process, and optionsWastewater 101: Decentralized approach, community process, and options
Wastewater 101: Decentralized approach, community process, and options
 

Similar to CHPC Workshop Morning Session

Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
Bioinformatics Open Source Conference
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
Tu Pham
 
Cloud computing components
Cloud computing componentsCloud computing components
Cloud computing components
PSG College of Technology
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
PranavPatil822557
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
Patrick Chanezon
 
Applied Security for Containers, OW2con'18, June 7-8, 2018, Paris
Applied Security for Containers, OW2con'18, June 7-8, 2018, ParisApplied Security for Containers, OW2con'18, June 7-8, 2018, Paris
Applied Security for Containers, OW2con'18, June 7-8, 2018, Paris
OW2
 
Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...
David Wallom
 
Kubernetes and Container Technologies from Cloud Native Computing Foundation
Kubernetes and Container Technologies from Cloud Native Computing FoundationKubernetes and Container Technologies from Cloud Native Computing Foundation
Kubernetes and Container Technologies from Cloud Native Computing Foundation
Cloud Standards Customer Council
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
karthik s
 
Cloud computing: highlights
Cloud computing: highlightsCloud computing: highlights
Cloud computing: highlights
Luís Bastião Silva
 
Understanding Docker and IBM Bluemix Container Service
Understanding Docker and IBM Bluemix Container ServiceUnderstanding Docker and IBM Bluemix Container Service
Understanding Docker and IBM Bluemix Container Service
Andrew Ferrier
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Mario Ishara Fernando
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
霈萱 蔡
 
Advantages And Disadvantages Of Virtualization
Advantages And Disadvantages Of VirtualizationAdvantages And Disadvantages Of Virtualization
Advantages And Disadvantages Of Virtualization
Elizabeth Anderson
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
The UberCloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
Vishwas N
 
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
OpenNebula Project
 
Elevating your Continuous Delivery Strategy Above the Rolling Clouds
Elevating your Continuous Delivery Strategy Above the Rolling CloudsElevating your Continuous Delivery Strategy Above the Rolling Clouds
Elevating your Continuous Delivery Strategy Above the Rolling Clouds
Michael Elder
 
Docker - Ankara JUG, Nisan 2015
Docker - Ankara JUG, Nisan 2015Docker - Ankara JUG, Nisan 2015
Docker - Ankara JUG, Nisan 2015
Mustafa AKIN
 

Similar to CHPC Workshop Morning Session (20)

Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Cloud computing components
Cloud computing componentsCloud computing components
Cloud computing components
 
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
Machine Learning , Analytics & Cyber Security the Next Level Threat Analytics...
 
Oscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby projectOscon 2017: Build your own container-based system with the Moby project
Oscon 2017: Build your own container-based system with the Moby project
 
Applied Security for Containers, OW2con'18, June 7-8, 2018, Paris
Applied Security for Containers, OW2con'18, June 7-8, 2018, ParisApplied Security for Containers, OW2con'18, June 7-8, 2018, Paris
Applied Security for Containers, OW2con'18, June 7-8, 2018, Paris
 
Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...Federating Infrastructure as a Service cloud computing systems to create a un...
Federating Infrastructure as a Service cloud computing systems to create a un...
 
Kubernetes and Container Technologies from Cloud Native Computing Foundation
Kubernetes and Container Technologies from Cloud Native Computing FoundationKubernetes and Container Technologies from Cloud Native Computing Foundation
Kubernetes and Container Technologies from Cloud Native Computing Foundation
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
Cloud computing: highlights
Cloud computing: highlightsCloud computing: highlights
Cloud computing: highlights
 
Understanding Docker and IBM Bluemix Container Service
Understanding Docker and IBM Bluemix Container ServiceUnderstanding Docker and IBM Bluemix Container Service
Understanding Docker and IBM Bluemix Container Service
 
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
Microservices , Docker , CI/CD , Kubernetes Seminar - Sri Lanka
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
 
Advantages And Disadvantages Of Virtualization
Advantages And Disadvantages Of VirtualizationAdvantages And Disadvantages Of Virtualization
Advantages And Disadvantages Of Virtualization
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the CloudHigh Performance Computing (HPC) and Engineering Simulations in the Cloud
High Performance Computing (HPC) and Engineering Simulations in the Cloud
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
 
Elevating your Continuous Delivery Strategy Above the Rolling Clouds
Elevating your Continuous Delivery Strategy Above the Rolling CloudsElevating your Continuous Delivery Strategy Above the Rolling Clouds
Elevating your Continuous Delivery Strategy Above the Rolling Clouds
 
Docker - Ankara JUG, Nisan 2015
Docker - Ankara JUG, Nisan 2015Docker - Ankara JUG, Nisan 2015
Docker - Ankara JUG, Nisan 2015
 

Recently uploaded

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 

Recently uploaded (20)

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 

CHPC Workshop Morning Session

  • 1. Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/
  • 2. Workshop Schedule Morning Session: Background Presentations and Prep 11:00 – 11:45 Introduction to Cloud Computing for Bioinformatics 11:45 – 12:00 Questions and Answers 12:00 – 12:30 Using Cloud BioLinux on the Amazon EC2 Cloud 12:30 – 13:00 Preparation: install Cloud Virtual Machines on laptops Afternoon Session: Hands on Session 14:00 – 14:30 Preparation: install Cloud Virtual Machines on laptops 14:30 – 16:00 Bioinformatic Analysis using Cloud BioLinux 16:30 – 17:30 Customized Bioinformatics Solutions for Participants
  • 3. A little bit of background information... ● Konstantinos (Ntinos) Krampis, started working at J. Craig Venter Inst. (JCVI) in 2009 ● Background training in Molecular Biology, PhD in Bioinformatics ● Research: cloud and high-performance computing, genome assembly ● Projects: Cloud BioLinux (cloudbiolinux.org) ● Taught Cloud BioLinux workshop at Univ. of Limpopo last May ● Slides available at http://www.slideshare.com/agbiotec ● Email me for slides, meeting, questions: kkrampis@jcvi.org
  • 4. J. Craig Venter Institute (JCVI) Large-scale genome sequencing and bioinformatics computing ●Human Microbiome Project (HMP): genome sequencing of microbes living in and on the human body ● Global Ocean Sampling (GOS) survey: genome sequencing of microbes sampled from oceans around the world
  • 5. JCVI: sequencing and computing infrastructure ● core sequencing laboratory: 454, Solexa, HiSeq, IonTorrent on the way ● dedicated bioinformatics department (57 bioinformaticians) ● large-scale computations, ~1000 node Sun Grid Engine (SGE) cluster
  • 6. Low-cost sequencing instruments ● small-factor sequencers available: GS Junior by 454, MiSeq by Illumina ● bacterial, viral, small fungal genomes, sequencing for variant discovery ● sequencing as a standard technique in molecular biology and genetics ● RNAseq (instead of microarrays) and ChiPseq (instead of yeast 2-hybrid) http://www.gsjunior.com/ http://www.illumina.com/systems/miseq.ilmn
  • 7. More small laboratories doing genome sequencing amount of sequencing number of labs acquiring the sequence data is only the first step...
  • 8. Sequencing instruments shipped with minimal computational capacity ●Problem 1: sequencing data analysis requires high-performance and expensive computing hardware, for example: genome assembly, BLAST, genome annotation ●Problem 2: much of bioinformatics software are difficult to install by biologists, need technical expertise with operating systems, compiling source code etc.
  • 9. Each lab building their own informatics infrastructure ? ●small labs need additional funds to build computing clusters ●funds for bioinformaticians and software developers to maintain the clusters and software ● duplication of effort across labs ●sub-optimal utilization of the hardware due to small amounts of sequencing
  • 10. Large sequencing centers offering bioinformatics analysis services ? ● Bioinformatic Resource Centers (BRC) ●bioinformatic analysis coupled with sequencing of an organism ● mostly provide data browsing and few analysis tools to the public ●cannot serve the bioinformatic needs of every small lab acquiring a sequencing instrument ●need end-to-end solutions, users submit sequence data and get final annotation
  • 11. Solving Problem 1: using high-performance computing hardware available on the cloud ●cloud computing : high performance computers and data storage, remotely accessible through the Internet ●we are all using the cloud: Gmail, Google Docs, FaceBook; you store and access data on a remote computer ●cloud computers rented pay-as-you-go by service providers such as Amazon Elastic Compute Cloud (EC2)
  • 12. The Amazon EC2 cloud computing service ● a subsidiary company of Amazon.com, rents computing pay-as-you go ● cloud computers cost $0.085 - $2 per hr (max 64GB memory and 8 processors) ● used by companies that need additional computers without investing on hardware ● physical locations US East / West regions, EU, Singapore, Japan researchers ● democratizes access to computing resources outside of institutional, economic or national boundaries 750 hours free for new users, sign up here: http://aws.amazon.com/free/ http://aws.amazon.com
  • 13. How does cloud computing work ? ● cloud computing evolved from virtualization technology ● operating system, bioinformatics software and data, are installed on a Virtual Machine (VM) ● VM is emulation of a computer system, in the form of a single, executable binary file ● runs inside a physical computer such as a laptop ● why Virtualization: simplify IT maintenance
  • 14. How does cloud computing work ? ● a VM is uploaded on the cloud remote Amazon EC2 cloud computing service service; runs by renting computing capacity from Amazon EC2 (up to VM VM VM 64GB RAM / 8 core computers) ● bioinformatics software can be executed from anywhere in the world through a desktop computer with Internet access Internet ● removes need for local computer clusters at each laboratory ● alternatively if you have a cluster locally it can run on a private cloud local computers
  • 15. Solving problem 2: pre-installed and configured bioinformatics software on cloud Virtual Machines ●Cloud BioLinux: a publicly accessible Virtual Machine (VM) on the Amazon EC2 cloud ●100+ pre-configured and installed bioinformatics software tools Amazon EC2 cloud ●sequence analysis, genome assembly, annotation, phylogeny, molecular modeling, gene expression ●a researcher can initiate a practically unlimited number of VMs for large-scale data analysis and access them using a local computer
  • 16. Cloud BioLinux for Bioinformatics ● how the Cloud BioLinux project came to be, what it can offers to small labs for genome sequence analysis ●where and how do I run Cloud BioLinux , especially if I am not a computer expert ● besides end-users, bioinformatics developers are provided a framework for modifying and sharing VM configurations and data
  • 17. Creating Cloud Biolinux ● JCVI bioinformatics cloud computing research tinyurl.com/BioLinux-NEBC ● NEBC BioLinux software repository ● community effort at BOSC 2009 – 11 + ●initially: a VM on Amazon EC2 with the tools copied and installed from the NEBC repository ● now: developer's framework for creating customized cloud VMs = ● major contributors: http://www.cloudbiolinux.org
  • 18. Research at JCVI with Cloud BioLinux ●Eucalyptus private cloud currently installed at JCVI, OpenStack on the way ●open-source cloud platforms, fully compatible with Amazon EC2 (identical API) ●easy to set up on a local computer cluster, comes with Ubuntu Linux server ●develop VMs in-house with complex bioinformatics pipelines pre-installed and upload to Amazon EC2 for public access ● free to use on your laptop with Virtualbox
  • 19. 10 min of questions....
  • 20. Research at JCVI with Cloud BioLinux ● bioinformatics data analysis pipelines have complex dependencies: operating system, software libraries, reference databases etc. ● approach: pre-install pipelines and all dependencies in a single binary VM file using a private cloud ●upload VM on Amazon EC2: pipelines ready to execute, no need to purchase hardware JCVI - GSC
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Research at JCVI with Cloud BioLinux ● Funded by NIAID until 2013 ●port complex bioinformatic pipelines on the Amazon EC2 cloud ● focus on viral reads-to-annotation data pipelines ● benefits to small laboratories that lack resources if you own a cluster: download and run VM on your JCVI - GSC ● private Eucalyptus or Openstack cloud
  • 26. Running Cloud BioLinux on the Amazon EC2 cloud Account on the Amazon EC2 cloud http://aws.amazon.com/ec2
  • 27. Launch Cloud BioLinux through the EC2 cloud console http://tinyurl.com/cloud-biolinux-tutorial
  • 28. Cloud BioLinux launch wizard: steps 1 & 2 1. go to the “Community AMIs” tab, specify the Cloud BioLinux VM identifier (most recent update: cloudbiolinux.org) 2. select computational capacity
  • 29. Cloud BioLinux launch wizard: step 3 3. specify a password for login to Cloud BioLinux in the “User Data” box 10
  • 31.
  • 32. Distributing Data Analysis Results with Cloud BioLinux
  • 33. Distributing Data Analysis Results with Cloud BioLinux Whole System Snapshot Exchange ● how difficult is to share bioinformatics work on your computer with a collaborator ? ● capture the state of the computing system (OS + software), data, analysis results ● make VM snapshots: executable binary file, replica of a running VM ● distribute a VM snapshot with pre-installed software and data analysis results ● collaborators can replicate, re-run, add to your analysis results ● a snapshot can be shared directly on the Amazon cloud, downloaded on a private cloud or run on desktop using virtualization software
  • 34. Cloud BioLinux: whole system snapshot exchange storage cost: 0.10$ / GB / month
  • 35. Cloud BioLinux: whole system snapshot exchange authorize access to the VM: public or for certain users other researchers can access the VM with all the software, data, analysis results directly on the cloud
  • 36. Cloud BioLinux for Software Developers ● Issue 1: for researchers with sensitive data a public cloud might not be an option ● Problem 1: moving VMs across clouds is not trivial, need low level operations ● Issue 2: bioinformatic specializations (ex. sequencing, phylogeny, protein structure) ● Problem 2: one VM with many tools to fit all becomes over-sized ● Cloud BioLinux VM deployment framework
  • 37. Cloud BioLinux for Software Developers ● framework to customize software installed in cloud VM / image ● based on python Fabric automated deployment tool ● software components listed in simple text configuration files ● edit the files to mix and match software according to your needs ● use source code repository to share configuration files for customized VMs ● start with a bare-bones VM on Amazon EC2 or Eucalyptus private cloud ● Fabric scripts automatically install specified software based on configuration files Free, available from: https://github.com/chapmanb/cloudbiolinux
  • 38. software domains in Cloud BioLinux: Genome sequencing, de novo assembly, annotation, phylogeny, molecular structures, gene expression analysis high-level configuration describing software groups for each group individual bioinformatics tools
  • 39.
  • 40. Acknowledgments & Credits Brad Chapman - development of the Fabric scripts, website Tim Booth, Mesude Bicak, Dawn Field – BioLinux 6.0 development Enis Afgan – Cloudman and Cloud BioLinux integration Members of the Cloud Biolinux community: http://groups.google.com/group/cloudbiolinux And again our contacts: kkrampis@jcvi.org Thank you ! http://www.cloudbiolinux.org http://www.slideshare.com/agbiotec