Successfully reported this slideshow.

Using BioNano Maps to Improve an Insect Genome Assembly​

3

Share

Upcoming SlideShare
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
Loading in …3
×
1 of 62
1 of 62

Using BioNano Maps to Improve an Insect Genome Assembly​

3

Share

Download to read offline

Algorithms and filters used to improve the Tribolium draft Assembly with Physical Maps Based on Imaging Ultra-Long Single DNA Molecules. Video of Webinar available at BioNano Genomics website http://www.bionanogenomics.com/bionano-community/webinars/ as "Using BioNano Maps to Improve an Insect Genome Assembly​".

Algorithms and filters used to improve the Tribolium draft Assembly with Physical Maps Based on Imaging Ultra-Long Single DNA Molecules. Video of Webinar available at BioNano Genomics website http://www.bionanogenomics.com/bionano-community/webinars/ as "Using BioNano Maps to Improve an Insect Genome Assembly​".

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Using BioNano Maps to Improve an Insect Genome Assembly​

  1. 1. Using BioNano Maps to Improve an Insect Genome Assembly ! ! ! ! ! ! ! ! ! Sue Brown Oct 23, 2014 1
  2. 2. Tribolium castaneum, the red flour beetle Genetic model organism for developmental, physiology and toxicology studies ! • Easily cultured • Short generation time • Small genome size • Molecular and visible marker genetic maps • Genetic tools: balancers, deficiencies • Genomic libraries: lambda and BAC • cDNA libraries • Mutant analysis and RNAi • Transformation • 7x Sanger draft genome (Nature, 2008) 2
  3. 3. Tribolium Genome ! ! ! ! Genome size: 200 (Mb) Cot Analysis 9 autosomes, X and Y Low methylation Long period interspersion ! ! ! ! ! ! Jeff Stuart, Purdue University 3
  4. 4. Molecular map markers used to anchor scaffolds to Chromosome builds Few X markers, no Y, variable marker density 4
  5. 5. Tcas 3.0 Reference Genome stats from NCBI Input file N50 (Mb) Number Cumulative Length (Mb) Genome contigs 0.04 8814 160.74 Genome scaffolds 0.98 481 152.53 Unmapped scaffolds 352 Unmapped contigs 1884 5
  6. 6. Algorithms and filters used to improve the Tribolium draft Assembly with Physical Maps Based on Imaging Ultra-Long Single DNA Molecules ! Jennifer Shelton 2014 6
  7. 7. Data formats BNX molecule 1 BNX - text file of molecules 7
  8. 8. Data formats BNX molecule 1 BNX - text file of molecules CMAP - text file of consensus maps 7
  9. 9. Data formats in silico CMAP - from genome FASTA BNX molecule 1 in silico CMAP 1 BNX - text file of molecules CMAP - text file of consensus maps 7
  10. 10. Data formats in silico CMAP - from genome FASTA BNG CMAP - from assembled molecules BNX molecule 1 in silico CMAP 1 BNG CMAP 1 BNX - text file of molecules CMAP - text file of consensus maps 7
  11. 11. Data formats in silico CMAP - from genome FASTA BNG CMAP - from assembled molecules XMAP - text file of alignment of two CMAPs BNX molecule 1 in silico CMAP 1 BNG CMAP 1 in silico CMAP 1 in silico CMAP 2 BNG CMAP 1 BNG CMAP 2 BNX - text file of molecules CMAP - text file of consensus maps 7
  12. 12. Assembly Pipeline BBNNXX BNX scanBNX 4) QC graphs for each flowcell adjBNX adjBNX 5) Merge all flowcells 6) First assemblies Strict p-value threshold Default p-value threshold Assembly workflow:! ! 1) The Irys produces tiff files that are converted into BNX text files.! 2) Each chip produces one BNX file for each of two flowcells.! 3) BNX files are split by scan and aligned to the sequence reference. Stretch (bases per pixel) is 3) use sequence reference to adjust molecule stretch for each scan recalculated from the alignment.! 4) Quality check graphs are created for each pre-adjusted flowcell BNX.! 5) Adjusted flowcell BNXs are merged.! 6) The first assemblies are run with a variety of p-value thresholds.! 7) The best of the first assemblies (red oval) is chosen and a version of this assembly is produced with a variety of minimum molecule length filters. adjBNX adjBNX 1) The Irys produces tiff files 3) Scan BNX are adjusted 7) Second assemblies Strict minimum molecule length Relaxed minimum molecule length mergeBNX Relaxed p-value threshold BBNNXX BNX scanBNX BBNNXX BNX scanBNX BBNNXX BNX scanBNX 2) Each chip produces flowcell BNX files BNX BNX BNX BNX 8
  13. 13. Assembly Pipeline In recent datasets when SNR is low and alignment is good we see a spike in bases per pixel (bpp) in the first scan, a plateau and a lower plateau First scan in a flow cell 9
  14. 14. Assembly Pipeline BBNNXX BNX scanBNX 4) QC graphs for each flowcell adjBNX adjBNX 5) Merge all flowcells 6) First assemblies Strict p-value threshold Default p-value threshold Assembly workflow:! ! 1) The Irys produces tiff files that are converted into BNX text files.! 2) Each chip produces one BNX file for each of two flowcells.! 3) BNX files are split by scan and aligned to the sequence reference. Stretch (bases per pixel) is 5) Use sequence reference to determine assembly noise parameters. Estimated recalculated from genome the alignment.! size is used to set the p-value threshold. 4) Quality check graphs are created for each pre-adjusted flowcell BNX.! 5) Adjusted flowcell BNXs are merged.! 6) The first assemblies are run with a variety of p-value thresholds.! 7) The best of the first assemblies (red oval) is chosen and a version of this assembly is produced with a variety of minimum molecule length filters. adjBNX adjBNX 1) The Irys produces tiff files 3) Scan BNX are adjusted 7) Second assemblies Strict minimum molecule length Relaxed minimum molecule length mergeBNX Relaxed p-value threshold BBNNXX BNX scanBNX BBNNXX BNX scanBNX BBNNXX BNX scanBNX 2) Each chip produces flowcell BNX files BNX BNX BNX BNX 10
  15. 15. Assembly Pipeline BBNNXX BNX scanBNX 4) QC graphs for each flowcell adjBNX adjBNX 5) Merge all flowcells 6) First assemblies Strict p-value threshold Default p-value threshold Assembly workflow:! ! 1) The Irys produces tiff files that are converted into BNX text files.! 2) Each chip produces one BNX file for each of two flowcells.! 3) BNX files are split by scan and aligned to the sequence reference. Stretch (bases per pixel) is 6/7) Variants of the starting p-value and default minimum molecule length are explored in nine assemblies. recalculated from the alignment.! 4) Quality check graphs are created for each pre-adjusted flowcell BNX.! 5) Adjusted flowcell BNXs are merged.! 6) The first assemblies are run with a variety of p-value thresholds.! 7) The best of the first assemblies (red oval) is chosen and a version of this assembly is produced with a variety of minimum molecule length filters. adjBNX adjBNX 1) The Irys produces tiff files 3) Scan BNX are adjusted 7) Second assemblies Strict minimum molecule length Relaxed minimum molecule length mergeBNX Relaxed p-value threshold BBNNXX BNX scanBNX BBNNXX BNX scanBNX BBNNXX BNX scanBNX 2) Each chip produces flowcell BNX files BNX BNX BNX BNX 11
  16. 16. Current Tribolium sequence-based assembly Input file N50 (Mb) Number of Scaffolds Cumulative Length (Mb) Genome FASTA 1.16 2240 160.74 in silico CMAP from FASTA 1.20 223 152.53 223 scaffolds from the sequence-based assembly were longer than 20 (kb) with more than 5 labels and were converted into in silico CMAPs 12
  17. 17. Assembly Results Input file N50 (Mb) Number Cumulative Length (Mb) Genome FASTA 1.16 2240 160.74 in silico CMAP from FASTA 1.20 223 152.53 CMAP from assembled BNG molecules (BNG CMAP) 1.35 216 200.47 BNG assembled molecules had a higher N50 and longer cumulative length than the sequence assembly ! The estimated size of the Tribolium genome is ~200 (Mb) 13
  18. 18. Simplest XMAP alignment description 1 (Mb) 1.1 (Mb) 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage for in silico CMAP: 2.1 (Mb) Total alignment length for in silico CMAP: 2.1 (Mb) ! Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) Total alignment length for BNG CMAP: 2.4 (Mb) in silico CMAP from genome FASTA CMAP from assembled molecules in silico CMAP 1 in silico CMAP 2 BNG CMAP 1 BNG CMAP 2 14
  19. 19. Complex XMAP alignment description 1 (Mb) in silico CMAP 1 BNG CMAP 1 BNG CMAP 2 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage for in silico CMAP: 1 (Mb) Total alignment length for in silico CMAP: 2 (Mb) ! Breadth of alignment coverage for BNG CMAP: 2.4 (Mb) Total alignment length for BNG CMAP: 2.4 (Mb) in silico CMAP from genome FASTA CMAP from assembled molecules 15
  20. 20. Alignment of CMAPs 1 (Mb) in silico CMAP 1 BNG CMAP 1 BNG CMAP 2 1.1 (Mb) 1.3 (Mb) Breadth of alignment coverage compared to total aligned length can indicate relevant relationships between assemblies ! In this example differences between "breadth" and "total" length could be due to: ! Genomic duplications in sample molecules were extracted from Assembly of alternate haplotypes Mis-assembly creating redundant contigs Collapsed repeat in sequence assembly in silico CMAP from genome FASTA CMAP from assembled molecules 16
  21. 21. Alignment of BNG assembly to reference genome CMAP name Breadth of alignment coverage for CMAP (Mb) Length of total alignment for CMAP (Mb) Percent of CMAP aligned in silico CMAP from FASTA 124.04 132.40 81 CMAP from assembled BNG molecules (BNG CMAP) 131.64 132.34 67 Close to 4% of the alignment of the in silico CMAP appears to be redundant ! Overall 81% of the in silico CMAP aligns to the BNG consensus map 17
  22. 22. ChLG 9 super! Alignment of BNG assembly to reference genome scaffold BNG consensus maps ChLG 9! scaffolds 130 131 133 134 132 129 135 127 136 137 BNG consensus Typically where redundant alignments occur two BNG consensus maps aligned suggesting they represent haplotypes although this has not been verified maps 18
  23. 23. Potential haplotypes where overlapping BNG cmaps align ChLG 9 super! scaffold BNG consensus maps ChLG 9! scaffolds 128 130 131 133 134 132 BNG consensus maps 19
  24. 24. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 20
  25. 25. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference alignment is inverted and used as input for stitch + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 20
  26. 26. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 Stitch.pl estimates super scaffolds using alignments of scaffolds and assembled BNG molecules using BNG Refaligner in silico CMAP aligned as reference alignment is inverted and used as input for stitch + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 alignments are filtered based on alignment length relative total possible alignment length and confidence + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 20
  27. 27. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 + in silico CMAP 1 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length 21
  28. 28. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 scaffolds + in silico CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length 22
  29. 29. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 - in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length 23
  30. 30. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 - in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length 24
  31. 31. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 2 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length 25
  32. 32. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length - in silico CMAP 3 26
  33. 33. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 scaffolds Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment fails because the alignment length is less than 30% of the potential alignment length - in silico CMAP 3 27
  34. 34. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium BNG CMAP 1 BNG CMAP 2 + in silico CMAP 1 + in silico CMAP 4 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 2 scaffolds Stitch.pl checks alignment length against potential alignment lengths to find relevant global rather than local alignments alignment passes because the alignment length is greater than 30% of the potential alignment length + in silico CMAP 4 28
  35. 35. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1 29
  36. 36. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds are filtered for longest and highest confidence alignment for each in silico CMAP BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 + in silico CMAP 1 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1 29
  37. 37. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds are filtered for longest and highest confidence alignment for each in silico CMAP Passing alignments are used to super scaffold BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 4 + in silico CMAP 1 + in silico CMAP 4 BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 high quality scaffolding alignments... + in silico CMAP 1 29
  38. 38. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds Stitch is iterated and additional super scaffolding alignments are found BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4
  39. 39. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds Stitch is iterated and additional super scaffolding alignments are found BNG CMAP 1 BNG CMAP 2 + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4 Until all super scaffolds are BNG CMAP 1 BNG CMAP 2 joined + in silico CMAP 2 - in silico CMAP 3 + in silico CMAP 1 + in silico CMAP 4
  40. 40. Alignment of BNG assembly to reference genome was used to super-scaffold the Tribolium scaffolds BNG CMAP 1 BNG CMAP 2 - in silico CMAP 3 + in silico CMAP 2 + in silico CMAP 4 + in silico CMAP 1 If gap length is estimated to be negative gaps are represented by 100 (bp) spacers 31
  41. 41. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths 32
  42. 42. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths 32
  43. 43. Negative gap lengths Is part of scaffold_23 connected to 136?! I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG assembly? 22 23 129 136 137 The longest negative gap length is from a BNG consenus map joining in silico 23 and 136 33
  44. 44. Negative gap lengths Is part of scaffold_23 connected to 136?! I went with the second alignment (21-26 together and 136-137 together because it is supported by genetic maps) but we should check these assemblies. ! ! In bottom alignment of 136 you can see that a large section of the BNG map 32 (which joins 23 to 136) is a duplicate in the BNG assembly? 22 23 129 136 137 ! Because the same region of 136 aligns to another BNG consensus map that aligns to its chromosome linkage group this alignment was rejected and stitch was re-run 34
  45. 45. scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? Negative gap lengths ChLG 2 super! scaffold ChLG 2 super! scaffold BNG consensus maps BNG consensus maps ChLG 2! scaffolds 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 Two new super scaffolds were created and the sequence similarity is being evaluated min confidence 10 scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? ChLG 2! scaffolds 130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 BNG consensus maps U 18 14 16 19 20 21 22 23 24 25 26 27 28 30 BNG consensus maps 35
  46. 46. Gap lengths Distribution of gap lengths for automated output Gap length (bp) This negative alignment also indicated a potential assembly issue Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths 36
  47. 47. Negative gap lengths This negative gap length is from a BNG consenus map joining in silico 81 and 102 and 103 Half of scaffold_81 aligns with ChLG7 37
  48. 48. Negative gap lengths Half of scaffold_81 aligns with ChLG7 79 80 81 82 83 Because the other half of 81 aligns to another BNG consensus map that aligns to its chromosome linkage group this alignment was rejected and stitch was re-run ! The BNG maps suggest a mis-assembly of in silico 81 at a sequence level 38
  49. 49. Distribution of gap lengths for automated output Gap length (bp) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths Gap lengths All extremely small negative gap lengths, < -20,000 (bp) (shaded), were independently flagged as potential sequence mis-assemblies to be checked at the sequence-level 39
  50. 50. Distribution of gap lengths for automated output Gap length (bp) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths Gap lengths All gaps from the shaded regions were also manually rejected and stitch.pl was rerun without them for the current super-scaffolded assembly ! We suspect extremely small negative gap sizes may be useful in locating sequence mis-assemblies ! stitch.pl version 1.4.5 rejects alignments if negative gap lengths < -20,000 (bp) but lists them in data summary 40
  51. 51. Tribolium super-scaffolds Input file N50 (Mb) Number of Scaffolds Cumulative Length (Mb) genome FASTA 1.16 2240 160.74 super-scaffold FASTA 4.46 2150 165.92 N50 of the super-scaffolded genome was ~4 times greater than the original ! Super-scaffolds tend to agree with the Tribolium genetic map 41
  52. 52. Tribolium super-scaffolds Input file N50 (Mb) Number of Scaffolds genome FASTA 1.16 2240 160.74 4.46 2150 165.92 For Tribolium : first minimum percent aligned = 30% first minimum confidence = 13 Cumulative Length (Mb) second minimum percent aligned = 90% second minimum confidence = 8 ! super-scaffold FASTA Lower quality alignments were manually selected if genetic map also supported the order Complex scaffolds were broken manually for sequence level evaluation 42
  53. 53. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. ChLG X was reduced from 13 scaffolds to 2 with one scaffold being moved to ChLG 3 ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13 43
  54. 54. Tribolium super-scaffolds min confidence 10 51 U 43 45 44 46 The second scaffold from ChLG X aligned to scaffolds from a portion of ChLG 3 ChLG 3 super! scaffold BNG consensus maps ChLG 3! scaffolds BNG consensus maps 32 33 34 35 36 2 37 38 39 40 41 42 ChLG 3 super! scaffold BNG consensus maps ChLG 3 super! scaffold BNG consensus 44
  55. 55. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. Two unplaced scaffolds aligned to ChLG X ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13 45
  56. 56. Tribolium super-scaffolds min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. 4% Redundancy in alignment may be from assembly of haplotypes (generally observed as two BNG consensus maps aligning to the same in silico map) ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13 46
  57. 57. Potential haplotypes where overlapping BNG cmaps align min confidence 10 From ChLGX, 11 of the previous 13 scaffolds were joined with two unplaced scaffolds (U) into one super scaffold. ChLG X super! scaffold BNG consensus maps ChLG X! scaffolds BNG consensus maps U 3 4 5 6 7 U 8 9 10 11 12 13 47
  58. 58. Tribmoinl icuonmfid esnucep 1e0r-scaffolds ChLG9 currently (2150 scaffold_133 aligns but is not visible in IrysView?) why is Super_scaffold_65 backwards? 128 130 131 133 134 132 129 135 127 136 137 139 138 140 141 142 143 144 145 For ChLG 9 21 scaffolds were reduced to 9 ChLG 9 super! scaffold BNG consensus maps ChLG 9! scaffolds BNG consensus maps 48
  59. 59. min confidence 10 Tribolium super-scaffolds For ChLG 5 17 scaffolds were reduced to 4 ChLG 5 super! scaffold BNG consensus maps ChLG 5! scaffolds BNG consensus maps 69 68 70 71 72 73 74 U 75 76 77 78 79 80 81 82 83 49
  60. 60. Future directions: Structural Variant (SV) Use SV-detect pipelines to resize existing gaps in scaffolds and identify mis-assemblies 50
  61. 61. Acknowledgements K-INBRE Bioinformatics Core! Susan Brown - PI Nic Herndon - script development Nanyan Lu - manual evaluation Michelle Coleman - extractions and running the Irys! Zachary Sliefert - metric summaries ! Bionano Genomics! Ernest Lam - assembly pipeline best practices assistance Weiping Wang - assistance with data formats Palak Sheth - collaboration to standardize analysis ! Script availability! https://github.com/i5K-KINBRE-script-share/Irys-scaffolding BNG scripts available by request from BNG ! Slide availability! http://www.slideshare.net/kstatebioinformatics/using-bionano-maps-to-improve-an-insect-genome- Physical Molecules! University, Warren Kansas State University assembly ! This project was supported by grants from the National Center for Research Resources (5P20RR016475) and the National Institute of General Medical Sciences (8P20GM103418) from the National Institutes of Health. were constructed by mapping molecular markers from the the assembly scaffolds, anchoring greater than 90% 51
  62. 62. Gap lengths Distribution of gap lengths for automated output Gap length (bp) Of the automated stitch.pl Tribolium super-scaffolds there were 66 gaps had known lengths and 26 had negative lengths (set to 100 (bp)) ! Of the manually edited Tribolium super-scaffolds there were 66 gaps had known lengths and 24 had negative lengths (set to 100 (bp)) Count −1500000 −1000000 −500000 0 500000 1000000 0 5 10 15 20 Negative gap lengths Positive gap lengths

×