SlideShare a Scribd company logo
1 of 1
Download to read offline
Picard
Target coverage
SamTools
Bamfile statistics
Statistics
SamTools
Variant calling
" VCF
GATK
Variant calling
" VCF
Variant detection
M. P. Caraciolo1
, F. V. Fiqueiredo1
, V. Monteiro1
1
Genomika Diagnósticos
Improving automation, reproducibility and
installation of genomic analysis pipelines with Docker
ABSTRACT
Bioinformatics pipelines usually rely on a combination of several components and
deploying them incurs substantial configuration and maintenance burden.
Genomics and variant analysis pipeline is normally difficult to install, configure and
deploy. We tackled this issue with a scalable and repeatable approach using Docker
containers (lightweight virtualization). Encapsulating NGS workflows working in
containers, a user can quickly deploy any pipeline version in any environment and
overcomes several issues from common used approaches with virtual machines.
The goal is to share our experiences for developing, distributing and running
pipelines encapsulated in containers using Docker.
bioinfo@genomika.com.br | genomika.com.br
Rua Senador José Henrique, 224, Alfred Nobel, Sala 1301 | Recife, PE | Brazil
INTRODUCTION AND MOTIVATION
The current approach using VM's lack portability, have substantial
overhead (disk, CPU, RAM) and require allocated resources to be
provisioned statically. The tools used in the pipelines generally are
installed using automatic scripts that may break due to no longer exist
tools or incorrect versions. For the biologists the problem is more
critical, since the adversities of finding and installing the required
softwares or limited documentation and obtaining good results requires
experiences.
WHAT IS DOCKER?
REFERENCES Benchmarks
Dockerized Pipeline Approach 1 Dockerized Pipeline Approach (in progress)
Boettiger C. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems
Review, Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):71-79
Di Tommaso P, Chatzou M, Baraja P, Notredame C. 2014. Nextflow: a novel tool for highly scalable
computational pipelines.
Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers
on the performance of genomic pipelines. PeerJ PrePrints. 2015;3:e1428.
doi:10.7287/peerj.preprints.1171v2.
Felter W, Ferreira A, Rajamony R, Rubio J. 2014. An updated performance comparison of virtual machines
and linux contain. IBM Research Available at http://ibm.co/V55Otq (accessed 1 June 2015)
PIPELINE ARCHITECTURE BEFORE CONTAINERS
OUR APPROACH
Time is expressed in minutes. The mean and the standard deviation were estimated from 10
separate runs. Slowdown represents the ratio of the mean execution time with Docker to the
mean execution time when Docker was not used.
Mean execution times for pipelines and tasks with and without Docker.
Docker is a open-source software, it isolates the tools
and software involved in processing, and makes easier
to recreate a snapshot of the current environment of
the pipeline for reproducibility without manual
re-installation of specific versions of software.
mounted volume
or
volume container
BWA
Hypervisor
Host OS
Server
App A
Bins/Libs
Guest OS
App B
Bins/Libs
Guest OS
Docker Engine
Host OS
Server
App A
Bins/Libs
App B
Bins/Libs
...
...
SamTools
Workflow
Base Container Base Container
mount
mount
input/output
Pros: Single container, easy to maintain
Cons: VM-like approach; huge, monolithic container,
difficult to share (against Docker philosophy)
Pros: Completely modularized,
easy to re-use/share workflow components
Cons: “Container hell”?
Mean task time
Native Docker
Mean execution
time
Native Docker
Execution time
std. deviation
Native Docker
SlowdownTasksPipeline
48Variant calling
Pipeline for WES
26.5 27.1 1254.4 4.9 2.6 1.0221293.8
VM Container
BWA
Mapping &
Pairing
SamTools
Format
conversion
" BAM
Picard
Remove
duplicates
" BAM
SamTools
Remove reads
with mapQV=0
" BAM
IGV
GATK
Local realignment around indels
Quality score recalibration
" BAM
Tool
Final alignment
in BAM format
config
file
input
fastq
mounted volume
or
volume container
Container A
BWA
Container B
SamTools
Workflow
mount
mount
input/output
containerized apps
Container C
Tool
config
file
input
fastq

More Related Content

Similar to Docker poster bsb2015-print

COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCINGnexgentechnology
 
Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesF5 Networks
 
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...Nikolaos Georgantas
 
Building cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerBuilding cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerJacob Feala
 
Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015Peter Gervais
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific ComputingPeter Bryzgalov
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus TrainingTim Tyler
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsOPNFV
 
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...Kento Aoyama
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsVideoguy
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDIRJET Journal
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDMustafa AKIN
 
Docker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and otherDocker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and otheradarsh20cs004
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsJules Pierre-Louis
 
CICD_BestPractices.pdf
CICD_BestPractices.pdfCICD_BestPractices.pdf
CICD_BestPractices.pdfmotupalli2
 
Edureka-DevOps-Ebook.pdf
Edureka-DevOps-Ebook.pdfEdureka-DevOps-Ebook.pdf
Edureka-DevOps-Ebook.pdfrelekarsushant
 

Similar to Docker poster bsb2015-print (20)

COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 
Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and services
 
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
 
Building cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerBuilding cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and Docker
 
chapter 5.docx
chapter 5.docxchapter 5.docx
chapter 5.docx
 
chapter 5.pdf
chapter 5.pdfchapter 5.pdf
chapter 5.pdf
 
Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus Training
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv Projects
 
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering Systems
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CD
 
Docker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and otherDocker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and other
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
 
DevOps-Ebook
DevOps-EbookDevOps-Ebook
DevOps-Ebook
 
CICD_BestPractices.pdf
CICD_BestPractices.pdfCICD_BestPractices.pdf
CICD_BestPractices.pdf
 
Edureka-DevOps-Ebook.pdf
Edureka-DevOps-Ebook.pdfEdureka-DevOps-Ebook.pdf
Edureka-DevOps-Ebook.pdf
 

More from Genomika Diagnósticos

Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...Genomika Diagnósticos
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...Genomika Diagnósticos
 
The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...Genomika Diagnósticos
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Genomika Diagnósticos
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolGenomika Diagnósticos
 
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...Genomika Diagnósticos
 
Construindo softwares de bioinformática para análises clínicas (Introdução)
Construindo softwares  de bioinformática  para análises clínicas (Introdução)  Construindo softwares  de bioinformática  para análises clínicas (Introdução)
Construindo softwares de bioinformática para análises clínicas (Introdução) Genomika Diagnósticos
 

More from Genomika Diagnósticos (9)

MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016
 
MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016
 
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
 
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
 
Construindo softwares de bioinformática para análises clínicas (Introdução)
Construindo softwares  de bioinformática  para análises clínicas (Introdução)  Construindo softwares  de bioinformática  para análises clínicas (Introdução)
Construindo softwares de bioinformática para análises clínicas (Introdução)
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Docker poster bsb2015-print

  • 1. Picard Target coverage SamTools Bamfile statistics Statistics SamTools Variant calling " VCF GATK Variant calling " VCF Variant detection M. P. Caraciolo1 , F. V. Fiqueiredo1 , V. Monteiro1 1 Genomika Diagnósticos Improving automation, reproducibility and installation of genomic analysis pipelines with Docker ABSTRACT Bioinformatics pipelines usually rely on a combination of several components and deploying them incurs substantial configuration and maintenance burden. Genomics and variant analysis pipeline is normally difficult to install, configure and deploy. We tackled this issue with a scalable and repeatable approach using Docker containers (lightweight virtualization). Encapsulating NGS workflows working in containers, a user can quickly deploy any pipeline version in any environment and overcomes several issues from common used approaches with virtual machines. The goal is to share our experiences for developing, distributing and running pipelines encapsulated in containers using Docker. bioinfo@genomika.com.br | genomika.com.br Rua Senador José Henrique, 224, Alfred Nobel, Sala 1301 | Recife, PE | Brazil INTRODUCTION AND MOTIVATION The current approach using VM's lack portability, have substantial overhead (disk, CPU, RAM) and require allocated resources to be provisioned statically. The tools used in the pipelines generally are installed using automatic scripts that may break due to no longer exist tools or incorrect versions. For the biologists the problem is more critical, since the adversities of finding and installing the required softwares or limited documentation and obtaining good results requires experiences. WHAT IS DOCKER? REFERENCES Benchmarks Dockerized Pipeline Approach 1 Dockerized Pipeline Approach (in progress) Boettiger C. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review, Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):71-79 Di Tommaso P, Chatzou M, Baraja P, Notredame C. 2014. Nextflow: a novel tool for highly scalable computational pipelines. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ PrePrints. 2015;3:e1428. doi:10.7287/peerj.preprints.1171v2. Felter W, Ferreira A, Rajamony R, Rubio J. 2014. An updated performance comparison of virtual machines and linux contain. IBM Research Available at http://ibm.co/V55Otq (accessed 1 June 2015) PIPELINE ARCHITECTURE BEFORE CONTAINERS OUR APPROACH Time is expressed in minutes. The mean and the standard deviation were estimated from 10 separate runs. Slowdown represents the ratio of the mean execution time with Docker to the mean execution time when Docker was not used. Mean execution times for pipelines and tasks with and without Docker. Docker is a open-source software, it isolates the tools and software involved in processing, and makes easier to recreate a snapshot of the current environment of the pipeline for reproducibility without manual re-installation of specific versions of software. mounted volume or volume container BWA Hypervisor Host OS Server App A Bins/Libs Guest OS App B Bins/Libs Guest OS Docker Engine Host OS Server App A Bins/Libs App B Bins/Libs ... ... SamTools Workflow Base Container Base Container mount mount input/output Pros: Single container, easy to maintain Cons: VM-like approach; huge, monolithic container, difficult to share (against Docker philosophy) Pros: Completely modularized, easy to re-use/share workflow components Cons: “Container hell”? Mean task time Native Docker Mean execution time Native Docker Execution time std. deviation Native Docker SlowdownTasksPipeline 48Variant calling Pipeline for WES 26.5 27.1 1254.4 4.9 2.6 1.0221293.8 VM Container BWA Mapping & Pairing SamTools Format conversion " BAM Picard Remove duplicates " BAM SamTools Remove reads with mapQV=0 " BAM IGV GATK Local realignment around indels Quality score recalibration " BAM Tool Final alignment in BAM format config file input fastq mounted volume or volume container Container A BWA Container B SamTools Workflow mount mount input/output containerized apps Container C Tool config file input fastq