De novo assembly, a
multi-technology approach:
Illumina, PacBio, and OpGen
PhD. Francesco Vezzi
Senior Bioinformatician, N...
Both Stockholm and Uppsala nodes
Illumina HiSeq 2000/2500 16
Illumina MiSeq 3
Life Technologies SOLiD 5500xl 4
Life Techno...
In this talk
Illumina (Stockholm):
• 100/150 bp paired reads (low error rate)
• 900/200 Gbp in 6/2 day(s)
PacBio (Uppsala)...
Optical Maps
• Restriction Map
◦ Representation of the cut sites on a
given DNA molecule to provide spatial
information of...
Optical Maps: workflow
DNA extraction directly
from culture
Quality control of
extracted material
Prepare a chip
Run Argus...
Closing genomes with Optical Maps
De novo reconstructs parts
missing in the reference strain
Correctly assembles long tand...
Case Study: Combing all the technologies
~15 Mbp genome sequenced at High Coverage with:
• Illumina HiSeq:
• 500X PE libra...
Assembly Strategy
https://github.com/vezzi/de_novo_scilife
Semi-automated pipeline for de novo assembly:
• Global configur...
QC-Module
Kmer analysis:
• Samples complexity
• Error rate
• Heterozygosity
0 1000 2000 3000 4000 5000 6000
05000100001500...
Assemble-Module
Illumina only:
• SOAPdenovo
• MaSuRCA
• Allpaths-LG
PacBio only:
• HGAP
• CABOG
Hybrid:
• PB-jelly (HAH)
>...
MaSuRCA HGAP PB-Jelly (HAH)
Validation-Module
FRCbam
Validation-Module
PacBio-only assembly is
clearly outperforming
the others
Optical Maps
PacBio produces the best assembly however 290 contigs contigs are produced.
Optical Maps allowed to obtain
th...
Incredible tool to finish (or almost finish) genomes
% contigs placed
Total size of placed
contigs
% size placed
contigs
%...
Conclusions – Take home message
Attempt to automate de novo assembly process:
• https://github.com/vezzi/de_novo_scilife
•...
Thanks
https://github.com/vezzi/de_novo_scilife
Upcoming SlideShare
Loading in...5
×

SeRC: de novo assembly workshop. Francesco Vezzi

926

Published on

De novo assembly, a multi-technology approach: Illumina, PacBio, and OpGen.
A multi technological prospective for de novo assembly projects.

Published in: Science, Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
926
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

SeRC: de novo assembly workshop. Francesco Vezzi

  1. 1. De novo assembly, a multi-technology approach: Illumina, PacBio, and OpGen PhD. Francesco Vezzi Senior Bioinformatician, NGI-Stockholm
  2. 2. Both Stockholm and Uppsala nodes Illumina HiSeq 2000/2500 16 Illumina MiSeq 3 Life Technologies SOLiD 5500xl 4 Life Technologies SOLiD 5500wildfire 2 Life Technologies Ion Torrent 2 Life Technologies Ion Proton 6 Life Technologies Sanger ABI3730 2 Pacific Biosciences RSII 1 Argus Whole Genome Mapping System 1 One of 3 best-equipped sequencing sites in Europe
  3. 3. In this talk Illumina (Stockholm): • 100/150 bp paired reads (low error rate) • 900/200 Gbp in 6/2 day(s) PacBio (Uppsala): • 8.5 Kbp reads, (max 30Kbp, high error rate) • 375 Mbp (1 SMRT Cell) in 10 hours OpGen Argus System (Stockholm): • ~300 Kbp maps • 10 Gbp in ~1 day
  4. 4. Optical Maps • Restriction Map ◦ Representation of the cut sites on a given DNA molecule to provide spatial information of genetic loci • An enzyme is selected and used to cut the molecules. This provides a 2D representation of the molecule structure
  5. 5. Optical Maps: workflow DNA extraction directly from culture Quality control of extracted material Prepare a chip Run Argus System Data assembly StepsTime 3-8h 1h 1.5h 1h 2-8h Notes
  6. 6. Closing genomes with Optical Maps De novo reconstructs parts missing in the reference strain Correctly assembles long tandem repeats De Novo assembly (Illumina, PacBio) Set of un-ordered and not oriented contigs Optical Map Contigs
  7. 7. Case Study: Combing all the technologies ~15 Mbp genome sequenced at High Coverage with: • Illumina HiSeq: • 500X PE libraries (180bp and 650bp insert) • 150X MP library (3Kbp) • 150X MP library (7Kbp) • PacBio • 50/60X with reads longer than 2Kbp • OpGen • 3 chips (only one worked really well) • 300X coverage • Average map length 320Kbp
  8. 8. Assembly Strategy https://github.com/vezzi/de_novo_scilife Semi-automated pipeline for de novo assembly: • Global configuration file  tools and system configuration • Sample configuration file  samples description 3 modules: 1. QC-module (Illumina only): • Adaptor removal, kmer-analysis, fastqc, (insert size estimation) 2. Assemble-module (Illumina only): • Runs specified assemblers and outputs executed commands 3. Validation-module: • FRCbam, coverage analysis, GC-analysis, (N50) I NEED USERS/FEEDBACK/CONTIRBUTIONS
  9. 9. QC-Module Kmer analysis: • Samples complexity • Error rate • Heterozygosity 0 1000 2000 3000 4000 5000 6000 05000100001500020000 Insert Size Histogram for All_Reads in file lib_3000.bam Insert Size Count FR RF TANDEM FASTQC Adaptor removal Alignment (partial assembly)
  10. 10. Assemble-Module Illumina only: • SOAPdenovo • MaSuRCA • Allpaths-LG PacBio only: • HGAP • CABOG Hybrid: • PB-jelly (HAH) >5000 #scaffolds totalLength maxContigLength N50 N80 percentageNs Allpaths-LG 227 14513103 596012 139364 57619 15% MASURCA 163 18549484 1188669 526519 282507 2% HGAP 290 14399273 763592 142483 37117 0% PB-Jelly 179 14718213 747750 195225 85127 13% • Try-and-fail process • Automated pipeline developed in order to streamline these analysis • MASURCA surprisingly the “best” assembler
  11. 11. MaSuRCA HGAP PB-Jelly (HAH) Validation-Module
  12. 12. FRCbam Validation-Module PacBio-only assembly is clearly outperforming the others
  13. 13. Optical Maps PacBio produces the best assembly however 290 contigs contigs are produced. Optical Maps allowed to obtain the 2D representation of the 7 chromosomes. N.B. chromosome number was one of the biological questions of this project!!! But much more can be done!!!
  14. 14. Incredible tool to finish (or almost finish) genomes % contigs placed Total size of placed contigs % size placed contigs % genome covered pacBio+OpGene 94.12 11578995 97% 77.05 Allpaths+OpGene 71.88 10692027 84% 52.88 Allpaths+Masurca+Opgene 80.65 27506424 92% 69.64 Allpaths+PacBio+Opgene 82.32 22271022 91% 83.05 Masurca+PacBio+pgene 94.44 28393392 98% 83.79 Allpaths+Masurca+PacBio+Opgene 85.42 39085419 94% 87.39 Combing all the technologies
  15. 15. Conclusions – Take home message Attempt to automate de novo assembly process: • https://github.com/vezzi/de_novo_scilife • Not 100% automated Illumina, PacBio, Hybrid assemblies: • PacBio alone seems to produce the best assemblers • Hybrid assembly seems to not be able to correct merged-assembly problems Mixing technologies is always a good idea: • Possibility to compensate technological biases • Allows to produce better assemblies
  16. 16. Thanks https://github.com/vezzi/de_novo_scilife
  1. ¿Le ha llamado la atención una diapositiva en particular?

    Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

×