SlideShare a Scribd company logo
1 of 51
Download to read offline
Algorithms and filters used to improve the Tribolium 
draft Assembly with Physical Maps Based on 
Imaging Ultra-Long Single DNA Molecules 
! 
Jennifer Shelton 
2014
Assembly Pipeline 
3) use sequence reference to adjust molecule stretch for each scan
Assembly Pipeline 
In recent datasets when SNR is low and alignment is good we see a spike in 
bases per pixel (bpp) in the first scan, a plateau and a lower plateau 
First scan in a 
flow cell
Assembly Pipeline 
5) Use sequence reference to determine assembly noise parameters. 
Estimated genome size is used to set the p-value threshold.
Assembly Pipeline 
6/7) Variants of the starting p-value and default minimum molecule length are 
explored in nine assemblies.
Current Tribolium sequence-based assembly 
Input file N50 (Mb) Number 
of Contigs 
Cumulative 
Length (Mb) 
Genome FASTA 1.16 2240 160.74 
in silico CMAP from FASTA 1.20 223 152.53 
223 scaffolds from the sequence-based assembly were longer than 20 (kb) 
with more than 5 labels and were converted into in silico CMAPs
Assembly Results 
Input file N50 (Mb) Number 
of Contigs 
Cumulative 
Length (Mb) 
Genome FASTA 1.16 2240 160.74 
in silico CMAP from FASTA 1.20 223 152.53 
CMAP from assembled BNG 
molecules (BNG CMAP) 
1.35 216 200.47 
BNG assembled molecules had a higher N50 and longer cumulative length 
than the sequence assembly 
! 
The estimated size of the Tribolium genome is ~200 (Mb)
Simplest XMAP alignment description 
1 (Mb) 
1.1 (Mb) 
1.1 (Mb) 1.3 (Mb) 
Breadth of alignment coverage for in silico CMAP: 2.1 (Mb) 
Total alignment length for in silico CMAP: 2.1 (Mb) 
! 
Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) 
Total alignment length for BNG CMAP: 2.4 (Mb) 
in silico CMAP 
from genome 
FASTA 
CMAP from 
assembled 
molecules 
in silico CMAP 1 in silico CMAP 2 
BNG CMAP 1 BNG CMAP 2
Complex XMAP alignment description 
1 (Mb) 
in silico CMAP 1 
BNG CMAP 1 BNG CMAP 2 
1.1 (Mb) 1.3 (Mb) 
Breadth of alignment coverage for in silico CMAP: 1 (Mb) 
Total alignment length for in silico CMAP: 2 (Mb) 
! 
Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) 
Total alignment length for BNG CMAP: 2.4 (Mb) 
in silico CMAP 
from genome 
FASTA 
CMAP from 
assembled 
molecules
Alignment of CMAPs 
1 (Mb) 
in silico CMAP 1 
BNG CMAP 1 BNG CMAP 2 
1.1 (Mb) 1.3 (Mb) 
Breadth of alignment coverage compared to total aligned length can indicate 
relevant relationships between assemblies 
! 
In this example differences between "breadth" and "total" length could be due to: 
! 
Duplications in sample molecules were extracted from 
Assembly of alternate haplotypes 
Mis-assembly creating redundant contigs 
Collapsed repeat in sequence assembly 
in silico CMAP 
from genome 
FASTA 
CMAP from 
assembled 
molecules
Alignment of BNG assembly to reference genome 
CMAP name Breadth of alignment 
coverage for CMAP 
(Mb) 
Length of total 
alignment for 
CMAP (Mb) 
Percent of CMAP 
aligned 
in silico CMAP from FASTA 124.04 132.40 81 
CMAP from assembled BNG 
molecules (BNG CMAP) 
131.64 132.34 67 
Close to 4% of the alignment of the in silico CMAP appears to be redundant 
! 
Overall 81% of the in silico CMAP aligns to the BNG consensus map
ChLG 9 super! 
Alignment of BNG assembly to reference genome 
scaffold 
BNG consensus 
maps 
ChLG 9! 
scaffolds 
130 131 133 134 132 129 135 127 136 137 BNG consensus 
Typically where redundant alignments occur two BNG consensus maps 
aligned suggesting they represent haplotypes although this has not been 
verified 
maps
Tribolium super-scaffolds overlapping BNG cmap 
ChLG 9 super! 
scaffold 
BNG consensus 
maps 
ChLG 9! 
scaffolds 
128 130 131 133 134 132 BNG consensus 
maps
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
+ in silico CMAP 1 + in silico CMAP 4 
Stitch.pl estimates super scaffolds using alignments of scaffolds and 
assembled BNG molecules using BNG Refaligner 
in silico CMAP 
aligned as 
reference 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 BNG CMAP 2
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
+ in silico CMAP 1 + in silico CMAP 4 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
Stitch.pl estimates super scaffolds using alignments of scaffolds and 
assembled BNG molecules using BNG Refaligner 
in silico CMAP 
aligned as 
reference 
alignment is 
inverted and 
used as input for 
stitch 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
+ in silico CMAP 1 + in silico CMAP 4 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
BNG CMAP 1 BNG CMAP 2 
Stitch.pl estimates super scaffolds using alignments of scaffolds and 
assembled BNG molecules using BNG Refaligner 
in silico CMAP 
aligned as 
reference 
alignment is 
inverted and 
used as input for 
stitch 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 4 
alignments are 
filtered based on 
alignment length 
relative total 
possible 
alignment length 
and confidence 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 1
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 
+ in silico CMAP 1 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment 
passes because 
the alignment 
length is greater 
than 30% of the 
potential 
alignment length
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 
scaffolds 
+ in silico CMAP 2 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment 
passes because 
the alignment 
length is greater 
than 30% of the 
potential 
alignment length
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
- in silico CMAP 2 
BNG CMAP 2 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment 
passes because 
the alignment 
length is greater 
than 30% of the 
potential 
alignment length
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
- in silico CMAP 2 
BNG CMAP 2 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment fails 
because the 
alignment length 
is less than 30% 
of the potential 
alignment length
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 2 
BNG CMAP 2 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment fails 
because the 
alignment length 
is less than 30% 
of the potential 
alignment length
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 2 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment 
passes because 
the alignment 
length is greater 
than 30% of the 
potential 
alignment length 
- in silico CMAP 3
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 2 
scaffolds 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment fails 
because the 
alignment length 
is less than 30% 
of the potential 
alignment length 
- in silico CMAP 3
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 1 + in silico CMAP 4 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 2 
scaffolds 
Stitch.pl checks alignment length against potential alignment lengths to find 
relevant global rather than local alignments 
alignment 
passes because 
the alignment 
length is greater 
than 30% of the 
potential 
alignment length 
+ in silico CMAP 4
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 4 
high quality 
scaffolding 
alignments... 
+ in silico CMAP 1
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
are filtered for 
longest and 
highest 
confidence 
alignment for 
each in silico 
CMAP 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 4 
+ in silico CMAP 1 + in silico CMAP 4 
high quality 
scaffolding 
alignments... 
+ in silico CMAP 1
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
are filtered for 
longest and 
highest 
confidence 
alignment for 
each in silico 
CMAP 
Passing 
alignments are 
used to super 
scaffold 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 4 
+ in silico CMAP 1 + in silico CMAP 4 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 1 + in silico CMAP 4 
high quality 
scaffolding 
alignments... 
+ in silico CMAP 1
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
Stitch is iterated 
and additional 
super 
scaffolding 
alignments are 
found 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 1 + in silico CMAP 4 
Iteration takes advantage of alignments where sequence-based scaffolds 
stitch BNG consensus maps
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
Stitch is iterated 
and additional 
super 
scaffolding 
alignments are 
found 
BNG CMAP 1 BNG CMAP 2 
+ in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 1 + in silico CMAP 4 
Until all super 
scaffolds are 
BNG CMAP 1 BNG CMAP 2 
joined + in silico CMAP 2 - in silico CMAP 3 
+ in silico CMAP 1 + in silico CMAP 4 
Iteration takes advantage of alignments where sequence-based scaffolds 
stitch BNG consensus maps
Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium 
scaffolds 
BNG CMAP 1 BNG CMAP 2 
- in silico CMAP 3 
+ in silico CMAP 2 
+ in silico CMAP 4 
+ in silico CMAP 1 
If gap length is estimated to be negative gaps are represented by 100 (bp) 
fillers
Gap lengths 
Distribution of gap lengths for automated output 
Gap length (bp) 
Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had 
known lengths and 26 had negative lengths (set to 100 (bp)) 
! 
Of the manually edited Tribolium super-scaffolds there were 66 gaps had 
known lengths and 24 had negative lengths (set to 100 (bp)) 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths
Gap lengths 
Distribution of gap lengths for automated output 
Gap length (bp) 
Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had 
known lengths and 26 had negative lengths (set to 100 (bp)) 
! 
Of the manually edited Tribolium super-scaffolds there were 66 gaps had 
known lengths and 24 had negative lengths (set to 100 (bp)) 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths
Negative gap lengths 
Is part of scaffold_23 connected to 136?! 
I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should 
check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG 
assembly? 
22 23 129 136 137 
The longest negative gap length is from a BNG consenus map joining in silico 
23 and 136
Negative gap lengths 
Is part of scaffold_23 connected to 136?! 
I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should 
check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG 
assembly? 
22 23 129 136 137 
! 
Because the same region of 136 aligns to another BNG consensus map that 
aligns to its chromosome linkage group this alignment was rejected and stitch 
was re-run
scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? 
Negative gap lengths 
ChLG 2 super! 
scaffold 
ChLG 2 super! 
scaffold 
BNG consensus 
maps 
BNG consensus 
maps 
ChLG 2! 
scaffolds 
133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 
Two new super scaffolds were created and the sequence similarity is being 
evaluated 
min confidence 10 
scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? 
ChLG 2! 
scaffolds 
130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 
U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 
BNG consensus 
maps 
U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 
BNG consensus 
maps
Gap lengths 
Distribution of gap lengths for automated output 
Gap length (bp) 
This negative alignment also indicated a potential assembly issue 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths
Negative gap lengths 
This negative gap length is from a BNG consenus map joining in silico 81 and 
102 and 103 
Half of scaffold_81 aligns with ChLG7
Negative gap lengths 
Half of scaffold_81 aligns with ChLG7 
79 80 81 82 83 
Because the other half of 81 aligns to another BNG consensus map that aligns 
to its chromosome linkage group this alignment was rejected and stitch was re-run 
! 
The BNG maps suggest a mis-assembly of in silico 81 at a sequence level
Distribution of gap lengths for automated output 
Gap length (bp) 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths 
Gap lengths 
All extremely small negative gap lengths, < -40,000 (bp) (shaded), were 
independently flagged as potential sequence mis-assemblies to be checked at 
the sequence-level
Distribution of gap lengths for automated output 
Gap length (bp) 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths 
Gap lengths 
All gaps from the shaded regions were also manually rejected and stitch.pl 
was rerun without them for the current super-scaffolded assembly 
! 
We suspect extremely small negative gap sizes may be useful in locating 
sequence mis-assemblies
Tribolium super-scaffolds 
Input file N50 (Mb) Number of 
Contigs 
Cumulative 
Length (Mb) 
genome FASTA 1.16 2240 160.74 
super-scaffold 
FASTA 
4.46 2150 165.92 
N50 of the super-scaffolded genome was ~4 times greater than the original 
! 
Super-scaffolds tend to agree with the Tribolium genetic map
Tribolium super-scaffolds 
Input file N50 (Mb) Number of 
Contigs 
genome FASTA 1.16 2240 160.74 
4.46 2150 165.92 
For Tribolium : 
first minimum percent aligned = 30% 
first minimum confidence = 13 
Cumulative 
Length (Mb) 
second minimum percent aligned = 90% 
second minimum confidence = 8 
! 
super-scaffold 
FASTA 
Lower quality alignments were manually selected if genetic map also supported 
the order 
Complex scaffolds were broken manually for sequence level evaluation
Tribolium super-scaffolds 
min confidence 10 
From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 
ChLG X was reduced from 13 scaffolds to 2 with one scaffold being moved to 
ChLG 3 
ChLG X super! 
scaffold 
BNG consensus 
maps 
ChLG X! 
scaffolds 
BNG consensus 
maps 
U 3 4 5 6 7 U 8 9 10 11 12 13
Tribolium super-scaffolds 
min confidence 10 
51 U 43 45 44 46 
The second scaffold from ChLG X aligned to scaffolds from a portion of 
ChLG 3 
ChLG 3 super! 
scaffold 
BNG consensus 
maps 
ChLG 3! 
scaffolds 
BNG consensus 
maps 
32 33 34 35 36 2 37 38 39 40 41 42 
ChLG 3 super! 
scaffold 
BNG consensus 
maps 
ChLG 3 super! 
scaffold 
BNG consensus
Tribolium super-scaffolds 
min confidence 10 
From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 
Two unplaced scaffolds aligned to ChLG X 
ChLG X super! 
scaffold 
BNG consensus 
maps 
ChLG X! 
scaffolds 
BNG consensus 
maps 
U 3 4 5 6 7 U 8 9 10 11 12 13
Tribolium super-scaffolds 
min confidence 10 
From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 
4% Redundancy in alignment may be from assembly of haplotypes (generally 
observed as two BNG consensus maps aligning to the same in silico map) 
ChLG X super! 
scaffold 
BNG consensus 
maps 
ChLG X! 
scaffolds 
BNG consensus 
maps 
U 3 4 5 6 7 U 8 9 10 11 12 13
Tribolium super-scaffolds overlapping BNG cmap 
min confidence 10 
From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 
ChLG X super! 
scaffold 
BNG consensus 
maps 
ChLG X! 
scaffolds 
BNG consensus 
maps 
U 3 4 5 6 7 U 8 9 10 11 12 13
Tribmoinl icuonmfid esnucep 1e0r-scaffolds 
ChLG9 currently (2150 scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? 
128 130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 
For ChLG 9 21 scaffolds were reduced to 9 
ChLG 9 super! 
scaffold 
BNG consensus 
maps 
ChLG 9! 
scaffolds 
BNG consensus 
maps
min confidence 10 
Tribolium super-scaffolds 
For ChLG 5 17 scaffolds were reduced to 4 
ChLG 5 super! 
scaffold 
BNG consensus 
maps 
ChLG 5! 
scaffolds 
BNG consensus 
maps 
69 68 70 71 72 73 74 U 75 76 77 78 79 80 81 82 83
Acknowledgements 
K-INBRE Bioinformatics Core! 
Susan Brown - PI 
Nic Herndon - script development 
Nanyan Lu - manual editing 
Michelle Coleman - extractions and running the Irys! 
Zachary Sliefert - metric summaries 
! 
Bionano Genomics! 
Ernest Lam - assembly pipeline best practices assistance 
Weiping Wang - assistance with data formats 
Palak Sheth - collaboration to standardize analysis 
! 
Script availability! 
https://github.com/i5K-KINBRE-script-share/Irys-scaffolding 
BNG scripts available by request from BNG
Gap lengths 
Distribution of gap lengths for automated output 
Gap length (bp) 
Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had 
known lengths and 26 had negative lengths (set to 100 (bp)) 
! 
Of the manually edited Tribolium super-scaffolds there were 66 gaps had 
known lengths and 24 had negative lengths (set to 100 (bp)) 
Count 
−1500000 −1000000 −500000 0 500000 1000000 
0 5 10 15 20 
Negative gap lengths 
Positive gap lengths

More Related Content

More from Jennifer Shelton

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation DetectionJennifer Shelton
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Jennifer Shelton
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqJennifer Shelton
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014Jennifer Shelton
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSJennifer Shelton
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.Jennifer Shelton
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleJennifer Shelton
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Jennifer Shelton
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2Jennifer Shelton
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalJennifer Shelton
 

More from Jennifer Shelton (15)

Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
Structural Variation Detection
Structural Variation DetectionStructural Variation Detection
Structural Variation Detection
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...Journal club slides to discuss "Differential analysis of gene regulation at t...
Journal club slides to discuss "Differential analysis of gene regulation at t...
 
Hub gene selection_ds
Hub gene selection_dsHub gene selection_ds
Hub gene selection_ds
 
Applied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-SeqApplied Bioinformatics Journal Club Pacbio RNA-Seq
Applied Bioinformatics Journal Club Pacbio RNA-Seq
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014
 
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGSTranslocation detection in lung cancer using mate-pair sequencing and iVIGS
Translocation detection in lung cancer using mate-pair sequencing and iVIGS
 
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
Summary slides by Prabhakar Chalise of the Oberg et al. 2012 article "Technic...
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle.
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
Multi-k-mer de novo transcriptome assembly and assembly of assemblies using 4...
 
Param selection phase1summary_v2
Param selection phase1summary_v2Param selection phase1summary_v2
Param selection phase1summary_v2
 
Bioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formalBioinformatic jc 08_14_2013_formal
Bioinformatic jc 08_14_2013_formal
 

Recently uploaded

Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 

Recently uploaded (20)

Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 

Bng presentation draft

  • 1. Algorithms and filters used to improve the Tribolium draft Assembly with Physical Maps Based on Imaging Ultra-Long Single DNA Molecules ! Jennifer Shelton 2014
  • 2. Assembly Pipeline 3) use sequence reference to adjust molecule stretch for each scan
  • 3. Assembly Pipeline In recent datasets when SNR is low and alignment is good we see a spike in bases per pixel (bpp) in the first scan, a plateau and a lower plateau First scan in a flow cell
  • 4. Assembly Pipeline 5) Use sequence reference to determine assembly noise parameters. Estimated genome size is used to set the p-value threshold.
  • 5. Assembly Pipeline 6/7) Variants of the starting p-value and default minimum molecule length are explored in nine assemblies.
  • 6. Current Tribolium sequence-based assembly Input file N50 (Mb) Number of Contigs Cumulative Length (Mb) Genome FASTA 1.16 2240 160.74 in silico CMAP from FASTA 1.20 223 152.53 223 scaffolds from the sequence-based assembly were longer than 20 (kb) with more than 5 labels and were converted into in silico CMAPs
  • 7. Assembly Results Input file N50 (Mb) Number of Contigs Cumulative Length (Mb) Genome FASTA 1.16 2240 160.74 in silico CMAP from FASTA 1.20 223 152.53 CMAP from assembled BNG molecules (BNG CMAP) 1.35 216 200.47 BNG assembled molecules had a higher N50 and longer cumulative length than the sequence assembly ! The estimated size of the Tribolium genome is ~200 (Mb)
  • 8. Simplest XMAP alignment description 1 (Mb) 1.1 (Mb) 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage for in silico CMAP: 2.1 (Mb) Total alignment length for in silico CMAP: 2.1 (Mb) ! Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) Total alignment length for BNG CMAP: 2.4 (Mb) in silico CMAP from genome FASTA CMAP from assembled molecules in silico CMAP 1 in silico CMAP 2 BNG CMAP 1 BNG CMAP 2
  • 9. Complex XMAP alignment description 1 (Mb) in silico CMAP 1 BNG CMAP 1 BNG CMAP 2 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage for in silico CMAP: 1 (Mb) Total alignment length for in silico CMAP: 2 (Mb) ! Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) Total alignment length for BNG CMAP: 2.4 (Mb) in silico CMAP from genome FASTA CMAP from assembled molecules
  • 10. Alignment of CMAPs 1 (Mb) in silico CMAP 1 BNG CMAP 1 BNG CMAP 2 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage compared to total aligned length can indicate relevant relationships between assemblies ! In this example differences between "breadth" and "total" length could be due to: ! Duplications in sample molecules were extracted from Assembly of alternate haplotypes Mis-assembly creating redundant contigs Collapsed repeat in sequence assembly in silico CMAP from genome FASTA CMAP from assembled molecules
  • 11. Alignment of BNG assembly to reference genome CMAP name Breadth of alignment coverage for CMAP (Mb) Length of total alignment for CMAP (Mb) Percent of CMAP aligned in silico CMAP from FASTA 124.04 132.40 81 CMAP from assembled BNG molecules (BNG CMAP) 131.64 132.34 67 Close to 4% of the alignment of the in silico CMAP appears to be redundant ! Overall 81% of the in silico CMAP aligns to the BNG consensus map
  • 12. ChLG 9 super! Alignment of BNG assembly to reference genome scaffold BNG consensus maps ChLG 9! scaffolds 130 131 133 134 132 129 135 127 136 137 BNG consensus Typically where redundant alignments occur two BNG consensus maps aligned suggesting they represent haplotypes although this has not been verified maps
  • 13. Tribolium super-scaffolds overlapping BNG cmap ChLG 9 super! scaffold BNG consensus maps ChLG 9! scaffolds 128 130 131 133 134 132 BNG consensus maps
  • 14. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2
  • 15. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference alignment is inverted and used as input for stitch + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3
  • 16. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference alignment is inverted and used as input for stitch + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 alignments are filtered based on alignment length relative total possible alignment length and confidence + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1
  • 17. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 + in silico CMAP 1 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length
  • 18. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 scaffolds + in silico CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length
  • 19. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 - in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length
  • 20. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 - in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length
  • 21. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length
  • 22. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length - in silico CMAP 3
  • 23. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 scaffolds Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length - in silico CMAP 3
  • 24. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 scaffolds Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length + in silico CMAP 4
  • 25. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1
  • 26. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds are filtered for longest and highest confidence alignment for each in silico CMAP BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 + in silico CMAP 1 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1
  • 27. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds are filtered for longest and highest confidence alignment for each in silico CMAP Passing alignments are used to super scaffold BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1
  • 28. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds Stitch is iterated and additional super scaffolding alignments are found BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 Iteration takes advantage of alignments where sequence-based scaffolds stitch BNG consensus maps
  • 29. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds Stitch is iterated and additional super scaffolding alignments are found BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 Until all super scaffolds are BNG CMAP 1 BNG CMAP 2 joined + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 Iteration takes advantage of alignments where sequence-based scaffolds stitch BNG consensus maps
  • 30. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 - in silico CMAP 3 + in silico CMAP 2 + in silico CMAP 4 + in silico CMAP 1 If gap length is estimated to be negative gaps are represented by 100 (bp) fillers
  • 31. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths
  • 32. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths
  • 33. Negative gap lengths Is part of scaffold_23 connected to 136?! I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG assembly? 22 23 129 136 137 The longest negative gap length is from a BNG consenus map joining in silico 23 and 136
  • 34. Negative gap lengths Is part of scaffold_23 connected to 136?! I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG assembly? 22 23 129 136 137 ! Because the same region of 136 aligns to another BNG consensus map that aligns to its chromosome linkage group this alignment was rejected and stitch was re-run
  • 35. scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? Negative gap lengths ChLG 2 super! scaffold ChLG 2 super! scaffold BNG consensus maps BNG consensus maps ChLG 2! scaffolds 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 Two new super scaffolds were created and the sequence similarity is being evaluated min confidence 10 scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? ChLG 2! scaffolds 130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 BNG consensus maps U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 BNG consensus maps
  • 36. Gap lengths Distribution of gap lengths for automated output Gap length (bp) This negative alignment also indicated a potential assembly issue Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths
  • 37. Negative gap lengths This negative gap length is from a BNG consenus map joining in silico 81 and 102 and 103 Half of scaffold_81 aligns with ChLG7
  • 38. Negative gap lengths Half of scaffold_81 aligns with ChLG7 79 80 81 82 83 Because the other half of 81 aligns to another BNG consensus map that aligns to its chromosome linkage group this alignment was rejected and stitch was re-run ! The BNG maps suggest a mis-assembly of in silico 81 at a sequence level
  • 39. Distribution of gap lengths for automated output Gap length (bp) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths Gap lengths All extremely small negative gap lengths, < -40,000 (bp) (shaded), were independently flagged as potential sequence mis-assemblies to be checked at the sequence-level
  • 40. Distribution of gap lengths for automated output Gap length (bp) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths Gap lengths All gaps from the shaded regions were also manually rejected and stitch.pl was rerun without them for the current super-scaffolded assembly ! We suspect extremely small negative gap sizes may be useful in locating sequence mis-assemblies
  • 41. Tribolium super-scaffolds Input file N50 (Mb) Number of Contigs Cumulative Length (Mb) genome FASTA 1.16 2240 160.74 super-scaffold FASTA 4.46 2150 165.92 N50 of the super-scaffolded genome was ~4 times greater than the original ! Super-scaffolds tend to agree with the Tribolium genetic map
  • 42. Tribolium super-scaffolds Input file N50 (Mb) Number of Contigs genome FASTA 1.16 2240 160.74 4.46 2150 165.92 For Tribolium : first minimum percent aligned = 30% first minimum confidence = 13 Cumulative Length (Mb) second minimum percent aligned = 90% second minimum confidence = 8 ! super-scaffold FASTA Lower quality alignments were manually selected if genetic map also supported the order Complex scaffolds were broken manually for sequence level evaluation
  • 43. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. ChLG X was reduced from 13 scaffolds to 2 with one scaffold being moved to ChLG 3 ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13
  • 44. Tribolium super-scaffolds min confidence 10 51 U 43 45 44 46 The second scaffold from ChLG X aligned to scaffolds from a portion of ChLG 3 ChLG 3 super! scaffold BNG consensus maps ChLG 3! scaffolds BNG consensus maps 32 33 34 35 36 2 37 38 39 40 41 42 ChLG 3 super! scaffold BNG consensus maps ChLG 3 super! scaffold BNG consensus
  • 45. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. Two unplaced scaffolds aligned to ChLG X ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13
  • 46. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 4% Redundancy in alignment may be from assembly of haplotypes (generally observed as two BNG consensus maps aligning to the same in silico map) ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13
  • 47. Tribolium super-scaffolds overlapping BNG cmap min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13
  • 48. Tribmoinl icuonmfid esnucep 1e0r-scaffolds ChLG9 currently (2150 scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? 128 130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 For ChLG 9 21 scaffolds were reduced to 9 ChLG 9 super! scaffold BNG consensus maps ChLG 9! scaffolds BNG consensus maps
  • 49. min confidence 10 Tribolium super-scaffolds For ChLG 5 17 scaffolds were reduced to 4 ChLG 5 super! scaffold BNG consensus maps ChLG 5! scaffolds BNG consensus maps 69 68 70 71 72 73 74 U 75 76 77 78 79 80 81 82 83
  • 50. Acknowledgements K-INBRE Bioinformatics Core! Susan Brown - PI Nic Herndon - script development Nanyan Lu - manual editing Michelle Coleman - extractions and running the Irys! Zachary Sliefert - metric summaries ! Bionano Genomics! Ernest Lam - assembly pipeline best practices assistance Weiping Wang - assistance with data formats Palak Sheth - collaboration to standardize analysis ! Script availability! https://github.com/i5K-KINBRE-script-share/Irys-scaffolding BNG scripts available by request from BNG
  • 51. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths