SlideShare a Scribd company logo
Motivation     Solution             Implementation   Demonstration




          Developing distributed analysis
         pipelines with shared community
        resources using CloudBioLinux and
                    CloudMan

                     Brad Chapman
                   Bioinformatics Core
             Harvard School of Public Health


                          22 September 2011
Motivation     Solution    Implementation   Demonstration



Acknowledgements

     CloudBioLinux – Ntino Krampis, Tim
             Booth, Dawn Field, Pjotr Prins and
             CloudBioLinux community
     CloudMan – Enis Afgan, James Taylor
     Exome pipeline – HSPH, MGH, Win Hide,
             Oliver Hofmann
Motivation   Solution   Implementation   Demonstration



Follow along



     http://www.slideshare.net/chapmanb
Motivation   Solution   Implementation   Demonstration



Cue the “lots of data” slide


     ls -lh fastq/
     24G 1_110907_AD08A5ACXX_1_fastq.txt
     21G 1_110907_AD08A5ACXX_2_fastq.txt
     24G 2_110907_AD08A5ACXX_1_fastq.txt
     20G 2_110907_AD08A5ACXX_2_fastq.txt
Motivation   Solution   Implementation   Demonstration



Rapidly changing tools
Motivation   Solution       Implementation   Demonstration



Science – fundamental challenge



      75%     one-off experimental
      25%     reused code
Motivation       Solution      Implementation      Demonstration



Unfortunate result




     http://news.ycombinator.com/item?id=2735537
Motivation      Solution    Implementation   Demonstration



Hard choices

     Computation
     Demands flexible, well-architected, scalable
     code
     Science
     Requires rapid turn around and
     experimentation
Motivation         Solution    Implementation   Demonstration



2 solutions (at least)



         1   Improve your programming skills
         2   Utilize community resources
Motivation   Solution   Implementation   Demonstration



Become a better coder




     http://software-carpentry.org/
Motivation         Solution     Implementation     Demonstration



Community resources


             Share painful parts
             Base of well-written, scalable code

     Start each problem from a higher level of
     abstraction
Motivation         Solution    Implementation   Demonstration



Community components


             CloudBioLinux – install software
             CloudMan – manage cluster
             Exome analysis pipeline – do science
Motivation        Solution    Implementation    Demonstration



CloudBioLinux

             Amazon image with bioinformatics
             software and libraries
             Automated build framework
             Community effort to maintain and
             extend
     http://cloudbiolinux.org
Motivation        Solution    Implementation   Demonstration



CloudMan

             SGE cluster plus automation
             Web interface and monitoring
             Persistence and sharing
             Powers the Galaxy Cloud offering
     http://wiki.g2.bx.psu.edu/Admin/Cloud
Motivation         Solution        Implementation   Demonstration



Exome analysis pipeline

             Existing algorithms
                 Aligners – Bowtie, BWA
                 Variation – GATK
                 Quality assessment – FastQC, Picard
             Messaging system – AMQP
     https://github.com/chapmanb/bcbb/
     tree/master/nextgen
Motivation   Solution   Implementation   Demonstration



Fastq lane processing
Motivation   Solution   Implementation   Demonstration



Sample processing
Motivation   Solution   Implementation   Demonstration



Variant calling
Motivation   Solution   Implementation   Demonstration



Parallelization
Motivation   Solution   Implementation   Demonstration
Motivation         Solution     Implementation   Demonstration



Amazon


             Virtual machines
                  Share
                  Reproduce
                  Coordinate
             Accessibility
Motivation        Solution   Implementation   Demonstration



What are we going to do?


             Use AWS console to boot
             CloudBioLinux
             Setup CloudMan in AWS console
             Boot CloudMan instance with demo
             data
Motivation         Solution   Implementation   Demonstration



What are we going to do?
continued

             Manage cluster with CloudMan interface
             Setup messaging queue
             Run pipeline, examine results
             Share cluster
Motivation        Solution       Implementation   Demonstration



CloudBioLinux

             Select and launch CloudBioLinux AMI
             from AWS console
             Connect
                 FreeNX graphical client
                 ssh

     Full tutorial PDF: http://j.mp/nnh5TE
Motivation           Solution        Implementation       Demonstration



Prep work


             Signup for AWS account:
             http://aws.amazon.com/
             Create login key pair in AWS Console
             Install NX client:
             http://www.nomachine.com/select-package-client.php
https://console.aws.amazon.com/ec2/
Select CloudBioLinux image from Community AMIs
enter NX password in user-data (freenxpass: secret)
Launch CloudBioLinux server
Get external hostname from Instances page
Connect using NX client, with ubuntu user and secret password
Connect with ssh, using private ssh key-pair
Terminate the server when finished
Motivation         Solution    Implementation   Demonstration



Setup CloudMan in AWS console


             Create a custom security group

     Full tutorial:
     http://wiki.g2.bx.psu.edu/Admin/Cloud
Create security group rules following wiki instructions
Final security group specifications
Motivation        Solution   Implementation   Demonstration



Boot CloudMan instance with
demo data


             Start server
             Pass in CloudMan user data
             Load shared CloudMan image
Follow same procedure as CloudBioLinux
Create CloudMan user-data file


 cluster_name: cbldemo
 password: cbl
 access_key: your_access_key
 secret_key: your_long_AWS_secret_key
Provide user-data from file
Choose created security group
Login to instance with password from user-data
Motivation           Solution          Implementation          Demonstration



CloudMan share-an-instance


             Persist data in a CloudMan cluster
             Easily sharable
     For this demo
     cm-b53c6f1223f966914df347687f6fc818/shared/2011-10-07–14-00
Import shared instance with demo data
Motivation         Solution    Implementation   Demonstration



Manage cluster with CloudMan


             Web-based console
             Monitor running processes
             Add nodes to cluster as needed
CloudMan console to interact with cluster
Add node to cluster
Motivation        Solution    Implementation   Demonstration



Setup messaging communication


             Command line access to server
             Adjust RabbitMQ configuration
             Setup messaging queue
Motivation       Solution      Implementation    Demonstration



Command line access to server

     ssh -i ~/.ec2/id-kunkel.keypair
         ubuntu@ec2-67-202-14-208.compute-1.amazonaws.com


     Follow approach used to connect to
     CloudBioLinux cluster; can also connect via
     NX
Motivation       Solution   Implementation   Demonstration




     Edit /export/data/galaxy/universe_wsgi.ini
     configuration file to add internal host name.

         [galaxy_amqp]
         host = ip-10-125-10-182.ec2.internal
         port = 5672
         userid = biouser
         password = tester
Motivation         Solution        Implementation        Demonstration



Setup messaging queue

     $ sudo rabbitmqctl add_user biouser tester
      creating user ’biouser’ ...
      ...done.
     $ sudo rabbitmqctl add_vhost bionextgen
      creating vhost ’bionextgen’ ...
      ...done.
     $ sudo rabbitmqctl set_permissions -p bionextgen
            biouser ".*" ".*" ".*"
      setting permissions for user ’biouser’ in vhost ’bionextgen’ ..
      ...done.
Motivation         Solution    Implementation   Demonstration



Run pipeline, examine results


             Ready to run distributed pipeline
             Demo data – two paired end fastq lanes
             Variant calling workflow
Motivation       Solution   Implementation   Demonstration



Input sequence data


         $ ls -1 /export/data/exome_example/fastq/
         7_100326_FC6107FAAXX_1-chr22.fastq
         7_100326_FC6107FAAXX_2-chr22.fastq
         8_100326_FC6107FAAXX_1-chr22.fastq
         8_100326_FC6107FAAXX_2-chr22.fastq
Motivation         Solution        Implementation        Demonstration



Run level: YAML Configuration
     $ cat /export/data/exome_example/config/run_info.yaml
     ---
     fc_date: ’100326’
     fc_name: FC6107FAAXX
     details:
       - files: [7_100326_FC6107FAAXX_1-chr22.fastq,
                 7_100326_FC6107FAAXX_2-chr22.fastq]
         lane: 7
         description: Test replicate 1
         analysis: SNP calling
         genome_build: hg19
         algorithm:
           quality_format: Standard
           hybrid_bait: hybrid_selection/baits.bed
           hybrid_target: hybrid_selection/targets.bed
Motivation         Solution        Implementation   Demonstration



System level: YAML Configuration
     $ cat /export/data/galaxy/post_process.yaml
     ---
     program:
       bowtie: bowtie
       bwa: bwa
       ucsc_bigwig: wigToBigWig
       picard: /usr/share/java/picard
       gatk: /usr/share/java/gatk
       snpEff: /usr/share/java/snpeff
       fastqc: fastqc
     distributed:
       cluster_platform: sge
       platform_args: ’-q all.q’
       cores_per_host: 1
       rabbitmq_vhost: bionextgen
Motivation       Solution      Implementation    Demonstration



Run exome pipeline


     $ cd /export/data/work
     $ distributed_nextgen_pipeline.py
         /export/data/galaxy/post_process.yaml
         /export/data/exome_example/fastq
         /export/data/exome_example/config/run_info.yaml
Motivation   Solution   Implementation   Demonstration



What just happened?
Motivation         Solution        Implementation       Demonstration



Monitoring: SGE queues


     $ qstat
     ob-ID prior name state submit/start at    queue
     --------------------------------------------------------------
     1 0.55500 nextgen_an r 18:16:32 all.q@ip-10-125-10-182.ec2.int
     2 0.55500 nextgen_an r 18:16:32 all.q@ip-10-86-254-105.ec2.int
     3 0.55500 automated_ r 18:16:47 all.q@ip-10-125-10-182.ec2.int
Motivation         Solution        Implementation        Demonstration



Monitoring: Analysis directory


     $ cd /export/data/work
     $ ls -lh
     drwxr-xr-x 4.0 alignments
     -rw-r--r-- 2.0K automated_initial_analysis.py.o11
     drwxr-xr-x   33 log
     -rw-r--r-- 15K nextgen_analysis_server.py.o10
     -rw-r--r-- 15K nextgen_analysis_server.py.o9
     drwxr-xr-x 102 tmp
Motivation         Solution        Implementation        Demonstration



Monitoring: Log files

     $ less nextgen_analysis_server.py.o10
      INFO: nextgen_pipeline: Processing sample: Test replicate 2;
        lane 8; reference genome hg19; researcher ;
        analysis method SNP calling
      INFO: nextgen_pipeline:
        Aligning lane 8_100326_FC6107FAAXX with bwa aligner
      INFO: nextgen_pipeline:
        Combining and preparing wig file [u’’, u’Test replicate 2’]
      INFO: nextgen_pipeline:
        Recalibrating [u’’, u’Test replicate 2’] with GATK
Motivation         Solution        Implementation          Demonstration



Retrieve results: Copy files

     $ upload_to_galaxy.py
         /export/data/galaxy/post_process.yaml
         /export/data/exome_example/fastq
         /export/data/work
         /export/data/exome_example/config/run_info.yaml


     Final files copied into new directory; allows
     cleanup of analysis directory
Motivation         Solution        Implementation        Demonstration



Retrieve results: Output directory


     $ ls -lh /export/data/galaxy/storage/100326_FC6107FAAXX/7
     -rw-r--r-- 38M 7_100326_FC6107FAAXX.bam
     -rw-r--r-- 22M 7_100326_FC6107FAAXX-coverage.bigwig
     -rw-r--r-- 72M 7_100326_FC6107FAAXX-gatkrecal.bam
     -rw-r--r-- 109K 7_100326_FC6107FAAXX-snp-effects.tsv
     -rw-r--r-- 827K 7_100326_FC6107FAAXX-snp-filter.vcf
     -rw-r--r-- 1.6M 7_100326_FC6107FAAXX-summary.pdf
Motivation        Solution     Implementation       Demonstration



Share results


             Share-an-instance
             Uses CloudMan web interface
             Reproducible research
                 CloudBioLinux AMI – software
                 CloudMan – data and configuration
CloudMan console enables push button sharing
Can make public or available to specific collaborators
When finished, turn everything off through CloudMan
Motivation         Solution    Implementation   Demonstration



Summary

     CloudBioLinux

             Shared machine image of biological
             software
             Boot from AWS console
             Connect with NX graphical client and
             ssh
Motivation         Solution    Implementation   Demonstration



Summary

     CloudMan

             Cluster setup and management
             Boot from share-an-instance
             Manage cluster through web interface
             Share final results
Motivation         Solution     Implementation   Demonstration



Summary

     Exome pipeline

             Parallel framework for running analyses
             Run using automated scripts
             Extract alignments, variant calls and
             summary information
Motivation       Solution      Implementation       Demonstration



Future: interfaces make it easier




     https://bitbucket.org/hbc/galaxy-central-hbc
Motivation   Solution   Implementation   Demonstration



Future: Simplified file selection
Motivation   Solution   Implementation   Demonstration



Future: Top level parameters
Motivation   Solution   Implementation   Demonstration



Future: Galaxy data libraries
Motivation   Solution   Implementation   Demonstration



Future: Galaxy analysis
Motivation   Solution   Implementation   Demonstration



Future: External UCSC
visualization
Motivation       Solution   Implementation   Demonstration



Read more

             Step-by-step instructions
             http://j.mp/rp69nx
             Approaches to parallelism
             http://j.mp/nPQHcm
             Future work
             http://bcbio.wordpress.com

More Related Content

What's hot

DevNet Associate : Python introduction
DevNet Associate : Python introductionDevNet Associate : Python introduction
DevNet Associate : Python introduction
Joel W. King
 
Let's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateLet's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a Certificate
Steffen Gebert
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
emedin
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0
Steffen Gebert
 
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState StackatovBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
CloudStack - Open Source Cloud Computing Project
 
Virtual CD4PE Workshop
Virtual CD4PE WorkshopVirtual CD4PE Workshop
Virtual CD4PE Workshop
Puppet
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus Stack
Wojciech Barczyński
 
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud ComputingvBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
CloudStack - Open Source Cloud Computing Project
 
Tribal Nova Docker feedback
Tribal Nova Docker feedbackTribal Nova Docker feedback
Tribal Nova Docker feedback
Nicolas Degardin
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
Brice Fernandes
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFu
VMware Tanzu
 
CMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessCMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessMarcus Hanwell
 
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebCleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Steffen Gebert
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
ShapeBlue
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!
Anton Weiss
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Tobias Schneck
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
Jigal van Hemert
 
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effortHow to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
ShapeBlue
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
Eric Wendelin
 

What's hot (19)

DevNet Associate : Python introduction
DevNet Associate : Python introductionDevNet Associate : Python introduction
DevNet Associate : Python introduction
 
Let's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a CertificateLet's go HTTPS-only! - More Than Buying a Certificate
Let's go HTTPS-only! - More Than Buying a Certificate
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0
 
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState StackatovBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
vBACD July 2012 - Deploying Private PaaS with ActiveState Stackato
 
Virtual CD4PE Workshop
Virtual CD4PE WorkshopVirtual CD4PE Workshop
Virtual CD4PE Workshop
 
Monitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus StackMonitor your Java application with Prometheus Stack
Monitor your Java application with Prometheus Stack
 
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud ComputingvBACD- July 2012 - Crash Course in Open Source Cloud Computing
vBACD- July 2012 - Crash Course in Open Source Cloud Computing
 
Tribal Nova Docker feedback
Tribal Nova Docker feedbackTribal Nova Docker feedback
Tribal Nova Docker feedback
 
Monitoring kubernetes with prometheus
Monitoring kubernetes with prometheusMonitoring kubernetes with prometheus
Monitoring kubernetes with prometheus
 
Spring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFuSpring Boot Revisited with KoFu and JaFu
Spring Boot Revisited with KoFu and JaFu
 
CMake: Improving Software Quality and Process
CMake: Improving Software Quality and ProcessCMake: Improving Software Quality and Process
CMake: Improving Software Quality and Process
 
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the WebCleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
Cleaning Up the Dirt of the Nineties - How New Protocols are Modernizing the Web
 
CloudStack usage service
CloudStack usage serviceCloudStack usage service
CloudStack usage service
 
Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!Ninja, Choose Your Weapon!
Ninja, Choose Your Weapon!
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
Ddev workshop t3dd18
Ddev workshop t3dd18Ddev workshop t3dd18
Ddev workshop t3dd18
 
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effortHow to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
How to add a new hypervisor to CloudStack - Lessons learned from Hyper-V effort
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
 

Viewers also liked

Vol 02 chapter 8 2012
Vol 02 chapter 8 2012Vol 02 chapter 8 2012
Vol 02 chapter 8 2012dphil002
 
Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009etalcomendras
 
Urban Cottage + IceMilk Aprons
Urban Cottage + IceMilk ApronsUrban Cottage + IceMilk Aprons
Urban Cottage + IceMilk Aprons
IceMilk Aprons
 
201004 - brain computer interaction
201004 - brain computer interaction201004 - brain computer interaction
201004 - brain computer interaction
Javier Gonzalez-Sanchez
 
Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'
Bart Van Belle
 
201506 CSE340 Lecture 23
201506 CSE340 Lecture 23201506 CSE340 Lecture 23
201506 CSE340 Lecture 23
Javier Gonzalez-Sanchez
 
Eddie Slide Show
Eddie Slide ShowEddie Slide Show
Eddie Slide Show
Meredith Mauer
 
New Venture Presentatie
New Venture PresentatieNew Venture Presentatie
New Venture Presentatie
Henk van der Berg
 
Week11 Presentation Group-C
Week11 Presentation Group-CWeek11 Presentation Group-C
Week11 Presentation Group-Cs1160114
 
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...Jackson Bond
 
201506 CSE340 Lecture 09
201506 CSE340 Lecture 09201506 CSE340 Lecture 09
201506 CSE340 Lecture 09
Javier Gonzalez-Sanchez
 
Chapter 11
Chapter 11Chapter 11
Chapter 11dphil002
 

Viewers also liked (20)

Vol 02 chapter 8 2012
Vol 02 chapter 8 2012Vol 02 chapter 8 2012
Vol 02 chapter 8 2012
 
007 014 belcaro corrige
007 014 belcaro corrige007 014 belcaro corrige
007 014 belcaro corrige
 
Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009Phenomenal Oct 22, 2009
Phenomenal Oct 22, 2009
 
201101 affective learning
201101 affective learning201101 affective learning
201101 affective learning
 
Urban Cottage + IceMilk Aprons
Urban Cottage + IceMilk ApronsUrban Cottage + IceMilk Aprons
Urban Cottage + IceMilk Aprons
 
Eeuwigblijvenleren
EeuwigblijvenlerenEeuwigblijvenleren
Eeuwigblijvenleren
 
201004 - brain computer interaction
201004 - brain computer interaction201004 - brain computer interaction
201004 - brain computer interaction
 
Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'Slides boekpresentatie 'Sociale Media en Journalistiek'
Slides boekpresentatie 'Sociale Media en Journalistiek'
 
Valvuloplastie
ValvuloplastieValvuloplastie
Valvuloplastie
 
201506 CSE340 Lecture 23
201506 CSE340 Lecture 23201506 CSE340 Lecture 23
201506 CSE340 Lecture 23
 
Uip Romain
Uip RomainUip Romain
Uip Romain
 
Eddie Slide Show
Eddie Slide ShowEddie Slide Show
Eddie Slide Show
 
New Venture Presentatie
New Venture PresentatieNew Venture Presentatie
New Venture Presentatie
 
Week11 Presentation Group-C
Week11 Presentation Group-CWeek11 Presentation Group-C
Week11 Presentation Group-C
 
Jay Cross Vivo Versao Final Corrigida
Jay Cross Vivo Versao Final CorrigidaJay Cross Vivo Versao Final Corrigida
Jay Cross Vivo Versao Final Corrigida
 
lectura
lecturalectura
lectura
 
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
Mobile Social Media, Sept. 2010, Do You Want To Be Visible?, Marketing Club K...
 
201506 CSE340 Lecture 09
201506 CSE340 Lecture 09201506 CSE340 Lecture 09
201506 CSE340 Lecture 09
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Barya Perception
Barya PerceptionBarya Perception
Barya Perception
 

Similar to Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformatics
Brad Chapman
 
Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...nimak
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti
 
Mcollective introduction
Mcollective introductionMcollective introduction
Mcollective introduction
Javier Turégano Molina
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
Nopparat Nopkuat
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
IT Arena
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Brocade
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
Patrick Chanezon
 
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
Amazon Web Services
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOps
Edureka!
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
IMC Institute
 
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual NetworkManaging Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
PLUMgrid
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Ltd
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
Matt Callanan
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Eduardo Patrocinio
 
Europe Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud servicesEurope Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud services
Runcy Oommen
 
AWS Code Services
AWS Code ServicesAWS Code Services
AWS Code Services
Amazon Web Services
 
TIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google CloudTIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google Cloud
The Incredible Automation Day
 

Similar to Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan (20)

Amazon resource for bioinformatics
Amazon resource for bioinformaticsAmazon resource for bioinformatics
Amazon resource for bioinformatics
 
Internship presentation
Internship presentationInternship presentation
Internship presentation
 
Automated Abstraction of Flow of Control in a System of Distributed Software...
Automated Abstraction of Flow of Control in a System of Distributed  Software...Automated Abstraction of Flow of Control in a System of Distributed  Software...
Automated Abstraction of Flow of Control in a System of Distributed Software...
 
Mihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate EverythingMihai Criveti - PyCon Ireland - Automate Everything
Mihai Criveti - PyCon Ireland - Automate Everything
 
Mcollective introduction
Mcollective introductionMcollective introduction
Mcollective introduction
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and KubeflowKostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
Kostiantyn Bokhan, N-iX. CD4ML based on Azure and Kubeflow
 
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
Event-driven automation, DevOps way ~IoT時代の自動化、そのリアリティとは?~
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
CI/CD on AWS: Deploy Everything All the Time | AWS Public Sector Summit 2016
 
Enhance your Agility with DevOps
Enhance your Agility with DevOpsEnhance your Agility with DevOps
Enhance your Agility with DevOps
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
 
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual NetworkManaging Multi-hypervisor OpenStack Cloud with Single Virtual Network
Managing Multi-hypervisor OpenStack Cloud with Single Virtual Network
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
 
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
Lessons Learned during IBM SmartCloud Orchestrator Deployment at a Large Tel...
 
Europe Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud servicesEurope Cloud Summit - Security hardening of public cloud services
Europe Cloud Summit - Security hardening of public cloud services
 
AWS Code Services
AWS Code ServicesAWS Code Services
AWS Code Services
 
TIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google CloudTIAD : Automate everything with Google Cloud
TIAD : Automate everything with Google Cloud
 

More from Brad Chapman

Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010Brad Chapman
 
Developing an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformaticsDeveloping an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformatics
Brad Chapman
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plotBrad Chapman
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyBrad Chapman
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 Intro
Brad Chapman
 
Lowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the webLowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the web
Brad Chapman
 

More from Brad Chapman (6)

Biopython at BOSC 2010
Biopython at BOSC 2010Biopython at BOSC 2010
Biopython at BOSC 2010
 
Developing an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformaticsDeveloping an open source community for cloud bioinformatics
Developing an open source community for cloud bioinformatics
 
GATK recalibration plot
GATK recalibration plotGATK recalibration plot
GATK recalibration plot
 
Next-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in GalaxyNext-generation sequencing request management system in Galaxy
Next-generation sequencing request management system in Galaxy
 
BioHackathon 2010 Intro
BioHackathon 2010 IntroBioHackathon 2010 Intro
BioHackathon 2010 Intro
 
Lowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the webLowering barriers to publishing biological data on the web
Lowering barriers to publishing biological data on the web
 

Recently uploaded

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 

Recently uploaded (20)

Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 

Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan

  • 1. Motivation Solution Implementation Demonstration Developing distributed analysis pipelines with shared community resources using CloudBioLinux and CloudMan Brad Chapman Bioinformatics Core Harvard School of Public Health 22 September 2011
  • 2. Motivation Solution Implementation Demonstration Acknowledgements CloudBioLinux – Ntino Krampis, Tim Booth, Dawn Field, Pjotr Prins and CloudBioLinux community CloudMan – Enis Afgan, James Taylor Exome pipeline – HSPH, MGH, Win Hide, Oliver Hofmann
  • 3. Motivation Solution Implementation Demonstration Follow along http://www.slideshare.net/chapmanb
  • 4. Motivation Solution Implementation Demonstration Cue the “lots of data” slide ls -lh fastq/ 24G 1_110907_AD08A5ACXX_1_fastq.txt 21G 1_110907_AD08A5ACXX_2_fastq.txt 24G 2_110907_AD08A5ACXX_1_fastq.txt 20G 2_110907_AD08A5ACXX_2_fastq.txt
  • 5. Motivation Solution Implementation Demonstration Rapidly changing tools
  • 6. Motivation Solution Implementation Demonstration Science – fundamental challenge 75% one-off experimental 25% reused code
  • 7. Motivation Solution Implementation Demonstration Unfortunate result http://news.ycombinator.com/item?id=2735537
  • 8. Motivation Solution Implementation Demonstration Hard choices Computation Demands flexible, well-architected, scalable code Science Requires rapid turn around and experimentation
  • 9. Motivation Solution Implementation Demonstration 2 solutions (at least) 1 Improve your programming skills 2 Utilize community resources
  • 10. Motivation Solution Implementation Demonstration Become a better coder http://software-carpentry.org/
  • 11. Motivation Solution Implementation Demonstration Community resources Share painful parts Base of well-written, scalable code Start each problem from a higher level of abstraction
  • 12. Motivation Solution Implementation Demonstration Community components CloudBioLinux – install software CloudMan – manage cluster Exome analysis pipeline – do science
  • 13. Motivation Solution Implementation Demonstration CloudBioLinux Amazon image with bioinformatics software and libraries Automated build framework Community effort to maintain and extend http://cloudbiolinux.org
  • 14. Motivation Solution Implementation Demonstration CloudMan SGE cluster plus automation Web interface and monitoring Persistence and sharing Powers the Galaxy Cloud offering http://wiki.g2.bx.psu.edu/Admin/Cloud
  • 15. Motivation Solution Implementation Demonstration Exome analysis pipeline Existing algorithms Aligners – Bowtie, BWA Variation – GATK Quality assessment – FastQC, Picard Messaging system – AMQP https://github.com/chapmanb/bcbb/ tree/master/nextgen
  • 16. Motivation Solution Implementation Demonstration Fastq lane processing
  • 17. Motivation Solution Implementation Demonstration Sample processing
  • 18. Motivation Solution Implementation Demonstration Variant calling
  • 19. Motivation Solution Implementation Demonstration Parallelization
  • 20. Motivation Solution Implementation Demonstration
  • 21. Motivation Solution Implementation Demonstration Amazon Virtual machines Share Reproduce Coordinate Accessibility
  • 22. Motivation Solution Implementation Demonstration What are we going to do? Use AWS console to boot CloudBioLinux Setup CloudMan in AWS console Boot CloudMan instance with demo data
  • 23. Motivation Solution Implementation Demonstration What are we going to do? continued Manage cluster with CloudMan interface Setup messaging queue Run pipeline, examine results Share cluster
  • 24. Motivation Solution Implementation Demonstration CloudBioLinux Select and launch CloudBioLinux AMI from AWS console Connect FreeNX graphical client ssh Full tutorial PDF: http://j.mp/nnh5TE
  • 25. Motivation Solution Implementation Demonstration Prep work Signup for AWS account: http://aws.amazon.com/ Create login key pair in AWS Console Install NX client: http://www.nomachine.com/select-package-client.php
  • 27. Select CloudBioLinux image from Community AMIs
  • 28. enter NX password in user-data (freenxpass: secret)
  • 30. Get external hostname from Instances page
  • 31. Connect using NX client, with ubuntu user and secret password
  • 32.
  • 33. Connect with ssh, using private ssh key-pair
  • 34. Terminate the server when finished
  • 35. Motivation Solution Implementation Demonstration Setup CloudMan in AWS console Create a custom security group Full tutorial: http://wiki.g2.bx.psu.edu/Admin/Cloud
  • 36. Create security group rules following wiki instructions
  • 37. Final security group specifications
  • 38. Motivation Solution Implementation Demonstration Boot CloudMan instance with demo data Start server Pass in CloudMan user data Load shared CloudMan image
  • 39. Follow same procedure as CloudBioLinux
  • 40. Create CloudMan user-data file cluster_name: cbldemo password: cbl access_key: your_access_key secret_key: your_long_AWS_secret_key
  • 43. Login to instance with password from user-data
  • 44. Motivation Solution Implementation Demonstration CloudMan share-an-instance Persist data in a CloudMan cluster Easily sharable For this demo cm-b53c6f1223f966914df347687f6fc818/shared/2011-10-07–14-00
  • 45. Import shared instance with demo data
  • 46. Motivation Solution Implementation Demonstration Manage cluster with CloudMan Web-based console Monitor running processes Add nodes to cluster as needed
  • 47. CloudMan console to interact with cluster
  • 48. Add node to cluster
  • 49. Motivation Solution Implementation Demonstration Setup messaging communication Command line access to server Adjust RabbitMQ configuration Setup messaging queue
  • 50. Motivation Solution Implementation Demonstration Command line access to server ssh -i ~/.ec2/id-kunkel.keypair ubuntu@ec2-67-202-14-208.compute-1.amazonaws.com Follow approach used to connect to CloudBioLinux cluster; can also connect via NX
  • 51. Motivation Solution Implementation Demonstration Edit /export/data/galaxy/universe_wsgi.ini configuration file to add internal host name. [galaxy_amqp] host = ip-10-125-10-182.ec2.internal port = 5672 userid = biouser password = tester
  • 52. Motivation Solution Implementation Demonstration Setup messaging queue $ sudo rabbitmqctl add_user biouser tester creating user ’biouser’ ... ...done. $ sudo rabbitmqctl add_vhost bionextgen creating vhost ’bionextgen’ ... ...done. $ sudo rabbitmqctl set_permissions -p bionextgen biouser ".*" ".*" ".*" setting permissions for user ’biouser’ in vhost ’bionextgen’ .. ...done.
  • 53. Motivation Solution Implementation Demonstration Run pipeline, examine results Ready to run distributed pipeline Demo data – two paired end fastq lanes Variant calling workflow
  • 54. Motivation Solution Implementation Demonstration Input sequence data $ ls -1 /export/data/exome_example/fastq/ 7_100326_FC6107FAAXX_1-chr22.fastq 7_100326_FC6107FAAXX_2-chr22.fastq 8_100326_FC6107FAAXX_1-chr22.fastq 8_100326_FC6107FAAXX_2-chr22.fastq
  • 55. Motivation Solution Implementation Demonstration Run level: YAML Configuration $ cat /export/data/exome_example/config/run_info.yaml --- fc_date: ’100326’ fc_name: FC6107FAAXX details: - files: [7_100326_FC6107FAAXX_1-chr22.fastq, 7_100326_FC6107FAAXX_2-chr22.fastq] lane: 7 description: Test replicate 1 analysis: SNP calling genome_build: hg19 algorithm: quality_format: Standard hybrid_bait: hybrid_selection/baits.bed hybrid_target: hybrid_selection/targets.bed
  • 56. Motivation Solution Implementation Demonstration System level: YAML Configuration $ cat /export/data/galaxy/post_process.yaml --- program: bowtie: bowtie bwa: bwa ucsc_bigwig: wigToBigWig picard: /usr/share/java/picard gatk: /usr/share/java/gatk snpEff: /usr/share/java/snpeff fastqc: fastqc distributed: cluster_platform: sge platform_args: ’-q all.q’ cores_per_host: 1 rabbitmq_vhost: bionextgen
  • 57. Motivation Solution Implementation Demonstration Run exome pipeline $ cd /export/data/work $ distributed_nextgen_pipeline.py /export/data/galaxy/post_process.yaml /export/data/exome_example/fastq /export/data/exome_example/config/run_info.yaml
  • 58. Motivation Solution Implementation Demonstration What just happened?
  • 59. Motivation Solution Implementation Demonstration Monitoring: SGE queues $ qstat ob-ID prior name state submit/start at queue -------------------------------------------------------------- 1 0.55500 nextgen_an r 18:16:32 all.q@ip-10-125-10-182.ec2.int 2 0.55500 nextgen_an r 18:16:32 all.q@ip-10-86-254-105.ec2.int 3 0.55500 automated_ r 18:16:47 all.q@ip-10-125-10-182.ec2.int
  • 60. Motivation Solution Implementation Demonstration Monitoring: Analysis directory $ cd /export/data/work $ ls -lh drwxr-xr-x 4.0 alignments -rw-r--r-- 2.0K automated_initial_analysis.py.o11 drwxr-xr-x 33 log -rw-r--r-- 15K nextgen_analysis_server.py.o10 -rw-r--r-- 15K nextgen_analysis_server.py.o9 drwxr-xr-x 102 tmp
  • 61. Motivation Solution Implementation Demonstration Monitoring: Log files $ less nextgen_analysis_server.py.o10 INFO: nextgen_pipeline: Processing sample: Test replicate 2; lane 8; reference genome hg19; researcher ; analysis method SNP calling INFO: nextgen_pipeline: Aligning lane 8_100326_FC6107FAAXX with bwa aligner INFO: nextgen_pipeline: Combining and preparing wig file [u’’, u’Test replicate 2’] INFO: nextgen_pipeline: Recalibrating [u’’, u’Test replicate 2’] with GATK
  • 62. Motivation Solution Implementation Demonstration Retrieve results: Copy files $ upload_to_galaxy.py /export/data/galaxy/post_process.yaml /export/data/exome_example/fastq /export/data/work /export/data/exome_example/config/run_info.yaml Final files copied into new directory; allows cleanup of analysis directory
  • 63. Motivation Solution Implementation Demonstration Retrieve results: Output directory $ ls -lh /export/data/galaxy/storage/100326_FC6107FAAXX/7 -rw-r--r-- 38M 7_100326_FC6107FAAXX.bam -rw-r--r-- 22M 7_100326_FC6107FAAXX-coverage.bigwig -rw-r--r-- 72M 7_100326_FC6107FAAXX-gatkrecal.bam -rw-r--r-- 109K 7_100326_FC6107FAAXX-snp-effects.tsv -rw-r--r-- 827K 7_100326_FC6107FAAXX-snp-filter.vcf -rw-r--r-- 1.6M 7_100326_FC6107FAAXX-summary.pdf
  • 64. Motivation Solution Implementation Demonstration Share results Share-an-instance Uses CloudMan web interface Reproducible research CloudBioLinux AMI – software CloudMan – data and configuration
  • 65. CloudMan console enables push button sharing
  • 66. Can make public or available to specific collaborators
  • 67. When finished, turn everything off through CloudMan
  • 68. Motivation Solution Implementation Demonstration Summary CloudBioLinux Shared machine image of biological software Boot from AWS console Connect with NX graphical client and ssh
  • 69. Motivation Solution Implementation Demonstration Summary CloudMan Cluster setup and management Boot from share-an-instance Manage cluster through web interface Share final results
  • 70. Motivation Solution Implementation Demonstration Summary Exome pipeline Parallel framework for running analyses Run using automated scripts Extract alignments, variant calls and summary information
  • 71. Motivation Solution Implementation Demonstration Future: interfaces make it easier https://bitbucket.org/hbc/galaxy-central-hbc
  • 72. Motivation Solution Implementation Demonstration Future: Simplified file selection
  • 73. Motivation Solution Implementation Demonstration Future: Top level parameters
  • 74. Motivation Solution Implementation Demonstration Future: Galaxy data libraries
  • 75. Motivation Solution Implementation Demonstration Future: Galaxy analysis
  • 76. Motivation Solution Implementation Demonstration Future: External UCSC visualization
  • 77. Motivation Solution Implementation Demonstration Read more Step-by-step instructions http://j.mp/rp69nx Approaches to parallelism http://j.mp/nPQHcm Future work http://bcbio.wordpress.com