Successfully reported this slideshow.

Standardising Swedish genomics analyses using nextflow

3

Share

Standardising Swedish
genomics analyses
using nextflow
Phil Ewels
@ewels
@tallphil
Nextflow Meeting
2017-09-14
CRG, Barcel...
2 x MiSeq 5 x HiSeq 2500 5 x HiSeq X10 NovaSeq
RNA-Seq
WG Re-Seq
Targeted Re-Seq
Metagenomics
Others
0 2000 4000 6000 8000...
CumulativeOutput(MBp)
0
250,000,000
500,000,000
750,000,000
1,000,000,000
Jan
2012
Sep
2012
M
ay
2013
Jan
2014
Sep
2014
M
...

YouTube videos are no longer supported on SlideShare

View original on YouTube

1 of 42
1 of 42

Standardising Swedish genomics analyses using nextflow

3

Share

Download to read offline

The SciLifeLab National Genomics Infrastructure is one of the largest sequencing facilities in Europe. We are an accredited facility providing library preparation, sequencing, basic analysis and quality control for Swedish research groups. Our sample throughput requires a highly automated and robust bioinformatics platform. Until recently, we had multiple analysis pipelines built with a range of different workflow tools for each data type. This made development work difficult and led to inevitable technical debt. In this talk I will describe how we have migrated to Nextflow for a range of our data types, the difficulties that we faced and how we hope to leverage Nextflow to migrate to the cloud in coming years.

The SciLifeLab National Genomics Infrastructure is one of the largest sequencing facilities in Europe. We are an accredited facility providing library preparation, sequencing, basic analysis and quality control for Swedish research groups. Our sample throughput requires a highly automated and robust bioinformatics platform. Until recently, we had multiple analysis pipelines built with a range of different workflow tools for each data type. This made development work difficult and led to inevitable technical debt. In this talk I will describe how we have migrated to Nextflow for a range of our data types, the difficulties that we faced and how we hope to leverage Nextflow to migrate to the cloud in coming years.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Standardising Swedish genomics analyses using nextflow

  1. 1. Standardising Swedish genomics analyses using nextflow Phil Ewels @ewels @tallphil Nextflow Meeting 2017-09-14 CRG, Barcelona
  2. 2. 2 x MiSeq 5 x HiSeq 2500 5 x HiSeq X10 NovaSeq RNA-Seq WG Re-Seq Targeted Re-Seq Metagenomics Others 0 2000 4000 6000 8000 10000 12000 14000 1,265 2,580 3,214 8,934 12,017 Number of Samples in 2016 1141 Gbp/day 1X Human Genome every 4 minutes NGI stockholmstockholm SciLifeLab NGI
  3. 3. CumulativeOutput(MBp) 0 250,000,000 500,000,000 750,000,000 1,000,000,000 Jan 2012 Sep 2012 M ay 2013 Jan 2014 Sep 2014 M ay 2015 Jan 2016 Sep 2016 M ay 2017 NGI stockholmstockholm - sequencing output SciLifeLab NGI
  4. 4. NGI bioinformatics • Initial data analysis for major protocols • Internal QC and standardised starting point for users • Team of 10 bioinformaticians • Accredited facility
  5. 5. analysis requirements Automated Reliable Easy for others to run Reproducible results icons: the noun project
  6. 6. NGI pipelines NouGAT (de-novo)
  7. 7. what have we learnt?
  8. 8. sharing is caring
  9. 9. sharing is caring • Open-source on GitHub from day one • Easier help and feedback from others • Other people may help to develop your code • https://github.com/nextflow-io/awesome-nextflow
  10. 10. use containers
  11. 11. use containers • Create a docker image, even if you don’t think you need to • Makes local and automated testing possible • Future proof for cloud / singularity / other people
  12. 12. test, test and test again
  13. 13. test, test and test again • Find a small test dataset • Make a bash script to fetch data and launch pipeline • Different flags to launch with different parameters • Use Travis build matrix to launch parallel test runs
  14. 14. use versioned releases
  15. 15. use versioned releases
  16. 16. minimal configs
  17. 17. minimal configs • Build config files around blocks of function • Hardware / software deps / genome references • Use nextflow profiles • Even if only using ‘standard’ default • Don’t be afraid to use multiple configs per profile • Build on a base profile and be clever with limits
  18. 18. minimal configs def check_max(obj, type) { if(type == 'memory'){ if(obj.compareTo(params.max_memory)) return params.max_memory else return obj } else if(type == 'time'){ if(obj.compareTo(params.max_time)) return params.max_time else return obj } else if(type == 'cpus'){ return Math.min( obj, params.max_cpus ) } } nextflow.config process { cpus = { check_max(16, 'cpus') } memory = { check_max(128.GB, 'memory') } time = { check_max(10.h, 'time') } } conf/base.config profiles { standard { includeConfig 'conf/base.config' includeConfig 'conf/igenomes.config' includeConfig 'conf/uppmax.config' } devel { includeConfig 'conf/base.config' includeConfig 'conf/igenomes.config' includeConfig 'conf/uppmax.config' includeConfig 'conf/uppmax-dev.config' } } nextflow.config params { max_cpus = 1 max_memory = 16.GB max_time = 1.h } conf/uppmax-dev.config
  19. 19. reference genomes
  20. 20. reference genomes params { genomes { 'GRCh37' { fasta = '/refs/human/genome.fasta' gtf = '/refs/human/genes.gtf' } 'GRCm38' { fasta = '/refs/mouse/genome.fasta' gtf = '/refs/mouse/genes.gtf' } } } conf/references.conf params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false params.gtf = params.genome ? params.genomes[ params.genome ].gtf ?: false : false main.nf $ nextflow run main.nf --genome GRCh37 $ nextflow run main.nf --fasta /path/to/my/genome.fa
  21. 21. reference genomes • illumina iGenomes is a great resource for this • Standard organisation allows easy use of multiple genomes • Use AWS iGenomes for free on AWS S3 • See https://ewels.github.io/AWS-iGenomes/
  22. 22. problems we’ve hit
  23. 23. dodgy file patterns
  24. 24. dodgy file patterns Channel .fromFilePairs( params.reads, size: -1 ) Channel .fromFilePairs( params.reads, size: params.singleEnd ? 1 : 2 ) If glob pattern doesn’t use {1,2} then all PE files are run in SE mode If glob pattern doesn’t use {1,2} then pipeline exits with no matching files
  25. 25. overwriting params
  26. 26. overwriting params // Custom trimming options params.clip_r1 = 0 params.clip_r2 = 0 params.three_prime_clip_r1 = 0 params.three_prime_clip_r2 = 0 // Preset trimming options params.pico = false if (params.pico){ params.clip_r1 = 3 params.clip_r2 = 0 params.three_prime_clip_r1 = 0 params.three_prime_clip_r2 = 3 } // Custom trimming options params.clip_r1 = 0 params.clip_r2 = 0 params.three_prime_clip_r1 = 0 params.three_prime_clip_r2 = 0 // Define regular variables clip_r1 = params.clip_r1 clip_r2 = params.clip_r2 tp_clip_r1 = params.three_prime_clip_r1 tp_clip_r2 = params.three_prime_clip_r2 // Preset trimming options params.pico = false if (params.pico){ clip_r1 = 3 clip_r2 = 0 tp_clip_r1 = 0 tp_clip_r2 = 3 } regular variables (this now triggers a warning message)
  27. 27. quick-fire round
  28. 28. MultiQC in workflows
  29. 29. MultiQC in workflows params.multiqc_config = "$baseDir/conf/multiqc_config.yaml" multiqc_config = file(params.multiqc_config) process run_multiqc { input: file multiqc_config file ('fastqc/*') from fastqc_results.collect() file ('alignment/*') from alignment_logs.collect() output: file "*multiqc_report.html" into multiqc_report file "*_data" script: """ multiqc -f --config $multiqc_config . """ } extra_fn_clean_exts: - _R1 - _R2 report_comment: > This report has been generated by the NGI-RNAseq analysis pipeline. For information about how to interpret these results, please see the docs. conf/multiqc_config.yaml
  30. 30. software versions process get_software_versions { output: file 'software_versions_mqc.yaml' into software_versions_yaml script: """ echo $pipeline_version > v_ngi_methylseq.txt echo $workflow.nextflow.version > v_nextflow.txt fastqc --version > v_fastqc.txt samtools --version > v_samtools.txt scrape_software_versions.py > software_versions_mqc.yaml """ } main.nf
  31. 31. email notifications
  32. 32. email notifications workflow.onComplete { def subject = 'My pipeline execution' def recipient = 'me@gmail.com' ['mail', '-s', subject, recipient].execute() << """ Pipeline execution summary --------------------------- Completed at: ${workflow.complete} Duration : ${workflow.duration} Success : ${workflow.success} workDir : ${workflow.workDir} exit status : ${workflow.exitStatus} Error report: ${workflow.errorReport ?: '-'} """ } Nextflow documentation
  33. 33. email notifications workflow.onComplete { // Render the HTML template def hf = new File("$baseDir/assets/email_template.html") def html_template = engine.createTemplate(hf).make(email_fields) def email_html = html_template.toString() // Send the HTML e-mail if (params.email) { [ 'sendmail', '-t' ].execute() << sendmail_html log.info "[NGI-MethylSeq] Sent summary e-mail to $params.email (sendmail)" } // Write summary e-mail HTML to a file def output_d = new File( "${params.outdir}/pipeline_info/" ) if( !output_d.exists() ) { output_d.mkdirs() } def output_hf = new File( output_d, "pipeline_report.html" ) output_hf.withWriter { w -> w << email_html } } main.nf
  34. 34. email notifications <html> <head><title>NGI-MethylSeq Pipeline Report</title></head> <body> <h1>NGI-MethylSeq: Bisulfite-Seq Best Practice v${version}</h1> <h2>Run Name: $runName</h2> <% if (success){ out << """ <div style="color: green;">NGI-MethylSeq execution completed successfully!</div> """ } else { out << """ <div style="color: #red;"> <h4>NGI-MethylSeq execution completed unsuccessfully!</h4> <p>The exit status of the failed task was: <code>$exitStatus</code>.</p> <p>The full error message was:</p> <pre>${errorReport}</pre> </div> """ } %> assets/email_template.html
  35. 35. [NGI-RNAseq] Successful: Test RNA Run email notifications
  36. 36. email notifications
  37. 37. email notifications [NGI-RNAseq] FAILED: Test RNA Run
  38. 38. Groovy syntax highlighting run-STAR = params.runstar run-STAR = params.runstar #!/usr/bin/env nextflow /* vim: syntax=groovy -*- mode: groovy;-*- */ main.nf without highlighting: with highlighting:
  39. 39. saving intermediates publishDir "${params.outdir}/trim_galore", mode: 'copy', saveAs: {fn -> if (fn.indexOf("_fastqc") > 0) "FastQC/$fn" else if (fn.indexOf("trimming_report") > 0) "logs/$fn" else params.saveTrimmed ? fn : null } publishDir "${params.outdir}/STAR", mode: 'copy', saveAs: { fn -> params.saveAlignedIntermediates ? fn : null }
  40. 40. future plans • Use singularity for everything • Benchmark AWS run pricing for future planning • Refine pipelines • Improve resource requests • Automate launch and run management
  41. 41. Phil Ewels phil.ewels@scilifelab.se ewels tallphil Acknowledgements http://github.com/SciLifeLab http://opensource.scilifelab.se NGI stockholm Max Käller Rickard Hammarén Denis Moreno Francesco Vezzi NGI Stockholm Genomics Applications Development Group Paolo Di Tommaso The nextflow community

×