Assembly: before and after

4,824 views

Published on

A talk I gave at the Dec 2013 Assembly Masterclass at UC Davis. Really licensed under CC0. UPDATED May 2014, for the presentation I gave at the combined SeRC Nordic Assembly Workshop in Stockholm, Sweden, May 14th 2014

Published in: Technology, Spiritual

Assembly: before and after

  1. 1. Assembly – before and after Lex Nederbragt lex.nederbragt@ibv.uio.no @lexnederbragt
  2. 2. A warning The list is by no means complete Nor do we have experience with all the programs mentioned
  3. 3. Sample DNA Reads Genome assembly Sequencing AssemblyDNA isolation QC QCQC
  4. 4. Reads Genome assembly Assembly QC
  5. 5. Fastqc
  6. 6. Prinseq
  7. 7. Many others… www.nipgr.res.in/ngsqctoolkit.html
  8. 8. preqc (sga) http://arxiv.org/abs/1307.8026
  9. 9. Reads Genome assembly Assembly Grooming
  10. 10. Format conversion http://en.wikipedia.org/wiki/FASTQ_format Fastq format hell
  11. 11. Adapter/quality trimming http://www.biostars.org/p/53528/ Celera assembler Overlap based trimming Fastx Toolkit Seqtk PrinSeq NGS QC Toolkit Trimmomatic BioPieces Cutadapt … …
  12. 12. Mate pair splitting and orientation 150 – 600 bases Illumina paired end reads 2 – 40 kilobases Illumina mate pair reads 2 – 40 kilobases 454 mate pair reads linker
  13. 13. Mate pair splitting and orientation Illumina paired end reads Illumina mate pair reads 454 mate pair reads linker junctionjunction + + paired end reads ‘contamination’
  14. 14. Mate pair splitting and orientation Illumina paired end reads Illumina mate pair reads 454 mate pair reads linker junctionjunction + + paired end reads ‘contamination’ Check what orientation your assembler expects for the reads!
  15. 15. Reads Genome assembly Assembly Preparing
  16. 16. Error-correction Stand-alone or built into assembler
  17. 17. Merging pairs List from Torsten Seeman’s blog http://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html COPE http://sourceforge.net/projects/coperead/ SeqPrep https://github.com/jstjohn/SeqPrep FLASH http://www.cbcb.umd.edu/software/flash fastq-join http://code.google.com/p/ea-utils/wiki/FastqJoin PANDAseq https://github.com/neufeld/pandaseq mergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py Recent addition
  18. 18. Extend reads http://140.116.235.124/~tliu/arf-pe/
  19. 19. Digital normalisation http://arxiv.org/abs/1203.4802
  20. 20. Estimate kmer to use preqc (SGA) http://arxiv.org/abs/1307.8026
  21. 21. Reads Genome assembly Assembly What can the reads tell us about the genome
  22. 22. kmer-based preqc (SGA) Kmerspectrumanalyzer http://arxiv.org/abs/1307.8026 Khmer from Titus
  23. 23. Reads Genome assembly Assembly This talk
  24. 24. Reads Genome assembly Assembly QC
  25. 25. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  26. 26. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  27. 27. Assemblathon stats http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assembla thon_stats.pl OR https://github.com/lexnederbragt/sequencetools/
  28. 28. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  29. 29. Gap closing IMAGE2
  30. 30. Correcting bases Quiver from Pacific Biosciences
  31. 31. Separate scaffolding
  32. 32. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  33. 33. Assembly merging/reconciliation
  34. 34. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  35. 35. Mapped genomic reads FRCBAM
  36. 36. Mapped transcriptomic reads
  37. 37. Gene finding
  38. 38. Binning Bacteroides Proteobacteria Cyanobacteria Per-con g read depth Nederbragt et al, 2010
  39. 39. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  40. 40. Genome browser(s) IGV
  41. 41. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  42. 42. Comparative measures Log Average Probability (LAP) Assembly Likelihood Evaluation (ALE) See also Howison, Zapata2 and Dunn (2013) Toward a statistically explicit understanding of de novo sequence assembly doi: 10.1093/bioinformatics/btt525
  43. 43. Genome assembly Comparing to each other Metrics Merging Improvement Visualization Validation Comparing to reference
  44. 44. Reference comparison Mauve assembly metrics
  45. 45. Review
  46. 46. Too many tools… http://seqanswers.com/wiki/Software/list
  47. 47. Too many tools… http://wwwdev.ebi.ac.uk/fg/hts_mappers 88 short-read mappers
  48. 48. Embargo!
  49. 49. Benchmarking, anyone?
  50. 50. All-in-one assembly pipeline doi:10.1186/1471-2105-15-126

×