SlideShare a Scribd company logo
Picard
Target coverage
SamTools
Bamfile statistics
Statistics
SamTools
Variant calling
" VCF
GATK
Variant calling
" VCF
Variant detection
M. P. Caraciolo1
, F. V. Fiqueiredo1
, V. Monteiro1
1
Genomika Diagnósticos
Improving automation, reproducibility and
installation of genomic analysis pipelines with Docker
ABSTRACT
Bioinformatics pipelines usually rely on a combination of several components and
deploying them incurs substantial configuration and maintenance burden.
Genomics and variant analysis pipeline is normally difficult to install, configure and
deploy. We tackled this issue with a scalable and repeatable approach using Docker
containers (lightweight virtualization). Encapsulating NGS workflows working in
containers, a user can quickly deploy any pipeline version in any environment and
overcomes several issues from common used approaches with virtual machines.
The goal is to share our experiences for developing, distributing and running
pipelines encapsulated in containers using Docker.
bioinfo@genomika.com.br | genomika.com.br
Rua Senador José Henrique, 224, Alfred Nobel, Sala 1301 | Recife, PE | Brazil
INTRODUCTION AND MOTIVATION
The current approach using VM's lack portability, have substantial
overhead (disk, CPU, RAM) and require allocated resources to be
provisioned statically. The tools used in the pipelines generally are
installed using automatic scripts that may break due to no longer exist
tools or incorrect versions. For the biologists the problem is more
critical, since the adversities of finding and installing the required
softwares or limited documentation and obtaining good results requires
experiences.
WHAT IS DOCKER?
REFERENCES Benchmarks
Dockerized Pipeline Approach 1 Dockerized Pipeline Approach (in progress)
Boettiger C. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems
Review, Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):71-79
Di Tommaso P, Chatzou M, Baraja P, Notredame C. 2014. Nextflow: a novel tool for highly scalable
computational pipelines.
Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers
on the performance of genomic pipelines. PeerJ PrePrints. 2015;3:e1428.
doi:10.7287/peerj.preprints.1171v2.
Felter W, Ferreira A, Rajamony R, Rubio J. 2014. An updated performance comparison of virtual machines
and linux contain. IBM Research Available at http://ibm.co/V55Otq (accessed 1 June 2015)
PIPELINE ARCHITECTURE BEFORE CONTAINERS
OUR APPROACH
Time is expressed in minutes. The mean and the standard deviation were estimated from 10
separate runs. Slowdown represents the ratio of the mean execution time with Docker to the
mean execution time when Docker was not used.
Mean execution times for pipelines and tasks with and without Docker.
Docker is a open-source software, it isolates the tools
and software involved in processing, and makes easier
to recreate a snapshot of the current environment of
the pipeline for reproducibility without manual
re-installation of specific versions of software.
mounted volume
or
volume container
BWA
Hypervisor
Host OS
Server
App A
Bins/Libs
Guest OS
App B
Bins/Libs
Guest OS
Docker Engine
Host OS
Server
App A
Bins/Libs
App B
Bins/Libs
...
...
SamTools
Workflow
Base Container Base Container
mount
mount
input/output
Pros: Single container, easy to maintain
Cons: VM-like approach; huge, monolithic container,
difficult to share (against Docker philosophy)
Pros: Completely modularized,
easy to re-use/share workflow components
Cons: “Container hell”?
Mean task time
Native Docker
Mean execution
time
Native Docker
Execution time
std. deviation
Native Docker
SlowdownTasksPipeline
48Variant calling
Pipeline for WES
26.5 27.1 1254.4 4.9 2.6 1.0221293.8
VM Container
BWA
Mapping &
Pairing
SamTools
Format
conversion
" BAM
Picard
Remove
duplicates
" BAM
SamTools
Remove reads
with mapQV=0
" BAM
IGV
GATK
Local realignment around indels
Quality score recalibration
" BAM
Tool
Final alignment
in BAM format
config
file
input
fastq
mounted volume
or
volume container
Container A
BWA
Container B
SamTools
Workflow
mount
mount
input/output
containerized apps
Container C
Tool
config
file
input
fastq

More Related Content

Similar to Docker poster bsb2015-print

COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
nexgentechnology
 
Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and services
F5 Networks
 
PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024
MarcelRibeiroDantas
 
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Nikolaos Georgantas
 
Building cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerBuilding cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and Docker
Jacob Feala
 
chapter 5.pdf
chapter 5.pdfchapter 5.pdf
chapter 5.pdf
Sami Siddiqui
 
chapter 5.docx
chapter 5.docxchapter 5.docx
chapter 5.docx
Sami Siddiqui
 
Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015
Peter Gervais
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
Peter Bryzgalov
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus Training
Tim Tyler
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv Projects
OPNFV
 
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Kento Aoyama
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering Systems
Videoguy
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
IRJET Journal
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
OCRE | Open Clouds for Research Environments
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CD
Mustafa AKIN
 
Docker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and otherDocker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and other
adarsh20cs004
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
Jules Pierre-Louis
 
DevOps-Ebook
DevOps-EbookDevOps-Ebook
DevOps-Ebook
PrathapM32
 
CICD_BestPractices.pdf
CICD_BestPractices.pdfCICD_BestPractices.pdf
CICD_BestPractices.pdf
motupalli2
 

Similar to Docker poster bsb2015-print (20)

COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
COST-EFFECTIVE LOW-DELAY DESIGN FOR MULTI-PARTY CLOUD VIDEO CONFERENCING
 
Using Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and servicesUsing Docker container technology with F5 Networks products and services
Using Docker container technology with F5 Networks products and services
 
PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024PyData Meetup Presentation in Natal April 2024
PyData Meetup Presentation in Natal April 2024
 
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
Mobile IoT Middleware Interoperability & QoS Analysis - Eclipse IoT Day Paris...
 
Building cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and DockerBuilding cloud-enabled genomics workflows with Luigi and Docker
Building cloud-enabled genomics workflows with Luigi and Docker
 
chapter 5.pdf
chapter 5.pdfchapter 5.pdf
chapter 5.pdf
 
chapter 5.docx
chapter 5.docxchapter 5.docx
chapter 5.docx
 
Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015Gervais Peter Resume Oct :2015
Gervais Peter Resume Oct :2015
 
Docker Application to Scientific Computing
Docker Application to Scientific ComputingDocker Application to Scientific Computing
Docker Application to Scientific Computing
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus Training
 
Summit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv ProjectsSummit 16: Cengn Experience in Opnfv Projects
Summit 16: Cengn Experience in Opnfv Projects
 
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
Evaluation of Container Virtualized MEGADOCK System in Distributed Computing ...
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering Systems
 
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CDMACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
MACHINE LEARNING AUTOMATIONS PIPELINE WITH CI/CD
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Cloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CDCloud Native Dünyada CI/CD
Cloud Native Dünyada CI/CD
 
Docker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and otherDocker-PPT.pdf for presentation and other
Docker-PPT.pdf for presentation and other
 
Containers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical SolutionsContainers: DevOp Enablers of Technical Solutions
Containers: DevOp Enablers of Technical Solutions
 
DevOps-Ebook
DevOps-EbookDevOps-Ebook
DevOps-Ebook
 
CICD_BestPractices.pdf
CICD_BestPractices.pdfCICD_BestPractices.pdf
CICD_BestPractices.pdf
 

More from Genomika Diagnósticos

MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016
Genomika Diagnósticos
 
MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016
Genomika Diagnósticos
 
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Genomika Diagnósticos
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
Genomika Diagnósticos
 
The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...
Genomika Diagnósticos
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Genomika Diagnósticos
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
Genomika Diagnósticos
 
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Genomika Diagnósticos
 
Construindo softwares de bioinformática para análises clínicas (Introdução)
Construindo softwares  de bioinformática  para análises clínicas (Introdução)  Construindo softwares  de bioinformática  para análises clínicas (Introdução)
Construindo softwares de bioinformática para análises clínicas (Introdução)
Genomika Diagnósticos
 

More from Genomika Diagnósticos (9)

MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016MamaRisk - Resume Article IHC 2016
MamaRisk - Resume Article IHC 2016
 
MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016MamaRisk - Presentation IHC 2016
MamaRisk - Presentation IHC 2016
 
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
Detecção de CNVs por NGS: validação de pipeline de bioinformática para painéi...
 
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 API-Centric Data Integration for Human Genomics Reference Databases: Achieve... API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
API-Centric Data Integration for Human Genomics Reference Databases: Achieve...
 
The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...The importance of an adequate soft-clip based approach on bioinformatics pipe...
The importance of an adequate soft-clip based approach on bioinformatics pipe...
 
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
Best Practices for Bioinformatics Pipelines for Molecular-Barcoded Targeted S...
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
 
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
Como seu DNA com a Bioinformática pode revolucionar o diagnóstico clínico no ...
 
Construindo softwares de bioinformática para análises clínicas (Introdução)
Construindo softwares  de bioinformática  para análises clínicas (Introdução)  Construindo softwares  de bioinformática  para análises clínicas (Introdução)
Construindo softwares de bioinformática para análises clínicas (Introdução)
 

Recently uploaded

QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
Sunil Jagani
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
Ortus Solutions, Corp
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
HarpalGohil4
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 

Recently uploaded (20)

QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptxAI in the Workplace Reskilling, Upskilling, and Future Work.pptx
AI in the Workplace Reskilling, Upskilling, and Future Work.pptx
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!Introducing BoxLang : A new JVM language for productivity and modularity!
Introducing BoxLang : A new JVM language for productivity and modularity!
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)AWS Certified Solutions Architect Associate (SAA-C03)
AWS Certified Solutions Architect Associate (SAA-C03)
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 

Docker poster bsb2015-print

  • 1. Picard Target coverage SamTools Bamfile statistics Statistics SamTools Variant calling " VCF GATK Variant calling " VCF Variant detection M. P. Caraciolo1 , F. V. Fiqueiredo1 , V. Monteiro1 1 Genomika Diagnósticos Improving automation, reproducibility and installation of genomic analysis pipelines with Docker ABSTRACT Bioinformatics pipelines usually rely on a combination of several components and deploying them incurs substantial configuration and maintenance burden. Genomics and variant analysis pipeline is normally difficult to install, configure and deploy. We tackled this issue with a scalable and repeatable approach using Docker containers (lightweight virtualization). Encapsulating NGS workflows working in containers, a user can quickly deploy any pipeline version in any environment and overcomes several issues from common used approaches with virtual machines. The goal is to share our experiences for developing, distributing and running pipelines encapsulated in containers using Docker. bioinfo@genomika.com.br | genomika.com.br Rua Senador José Henrique, 224, Alfred Nobel, Sala 1301 | Recife, PE | Brazil INTRODUCTION AND MOTIVATION The current approach using VM's lack portability, have substantial overhead (disk, CPU, RAM) and require allocated resources to be provisioned statically. The tools used in the pipelines generally are installed using automatic scripts that may break due to no longer exist tools or incorrect versions. For the biologists the problem is more critical, since the adversities of finding and installing the required softwares or limited documentation and obtaining good results requires experiences. WHAT IS DOCKER? REFERENCES Benchmarks Dockerized Pipeline Approach 1 Dockerized Pipeline Approach (in progress) Boettiger C. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review, Special Issue on Repeatability and Sharing of Experimental Artifacts 49(1):71-79 Di Tommaso P, Chatzou M, Baraja P, Notredame C. 2014. Nextflow: a novel tool for highly scalable computational pipelines. Di Tommaso P, Palumbo E, Chatzou M, Prieto P, Heuer ML, Notredame C. The impact of Docker containers on the performance of genomic pipelines. PeerJ PrePrints. 2015;3:e1428. doi:10.7287/peerj.preprints.1171v2. Felter W, Ferreira A, Rajamony R, Rubio J. 2014. An updated performance comparison of virtual machines and linux contain. IBM Research Available at http://ibm.co/V55Otq (accessed 1 June 2015) PIPELINE ARCHITECTURE BEFORE CONTAINERS OUR APPROACH Time is expressed in minutes. The mean and the standard deviation were estimated from 10 separate runs. Slowdown represents the ratio of the mean execution time with Docker to the mean execution time when Docker was not used. Mean execution times for pipelines and tasks with and without Docker. Docker is a open-source software, it isolates the tools and software involved in processing, and makes easier to recreate a snapshot of the current environment of the pipeline for reproducibility without manual re-installation of specific versions of software. mounted volume or volume container BWA Hypervisor Host OS Server App A Bins/Libs Guest OS App B Bins/Libs Guest OS Docker Engine Host OS Server App A Bins/Libs App B Bins/Libs ... ... SamTools Workflow Base Container Base Container mount mount input/output Pros: Single container, easy to maintain Cons: VM-like approach; huge, monolithic container, difficult to share (against Docker philosophy) Pros: Completely modularized, easy to re-use/share workflow components Cons: “Container hell”? Mean task time Native Docker Mean execution time Native Docker Execution time std. deviation Native Docker SlowdownTasksPipeline 48Variant calling Pipeline for WES 26.5 27.1 1254.4 4.9 2.6 1.0221293.8 VM Container BWA Mapping & Pairing SamTools Format conversion " BAM Picard Remove duplicates " BAM SamTools Remove reads with mapQV=0 " BAM IGV GATK Local realignment around indels Quality score recalibration " BAM Tool Final alignment in BAM format config file input fastq mounted volume or volume container Container A BWA Container B SamTools Workflow mount mount input/output containerized apps Container C Tool config file input fastq