Bionano genome maps_feb2014
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Bionano genome maps_feb2014

  • 623 views
Uploaded on

Whole genome restriction maps for nonmodel organisms: genomic resources where there were none.

Whole genome restriction maps for nonmodel organisms: genomic resources where there were none.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
623
On Slideshare
480
From Embeds
143
Number of Embeds
16

Actions

Shares
Downloads
17
Comments
0
Likes
0

Embeds 143

http://bioinformaticsk-state.blogspot.com 102
http://bioinformaticsk-state.blogspot.fr 5
http://bioinformaticsk-state.blogspot.ch 4
http://bioinformaticsk-state.blogspot.co.uk 4
http://bioinformaticsk-state.blogspot.ru 4
http://bioinformaticsk-state.blogspot.kr 3
http://bioinformaticsk-state.blogspot.nl 3
http://bioinformaticsk-state.blogspot.tw 3
http://bioinformaticsk-state.blogspot.de 3
http://bioinformaticsk-state.blogspot.jp 2
http://bioinformaticsk-state.blogspot.in 2
http://bioinformaticsk-state.blogspot.pt 2
http://bioinformaticsk-state.blogspot.com.au 2
http://bioinformaticsk-state.blogspot.ca 2
http://bioinformaticsk-state.blogspot.co.il 1
http://bioinformaticsk-state.blogspot.be 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Whole genome restriction maps for nonmodel organisms: genomic resources where there were none. Sue Brown Division of Biology Kansas State University Tuesday, February 25, 14
  • 2. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 3. Genomes • Genomes come in many sizes • Genome assemblies come in many qualities • Draft Assemblies ▫ Most genomes sequenced today (nonmodel) • Finished Assemblies ▫ Model organisms (lots of resources)  Human  Computational  Genetic and genomic tools • Genomic resources increase the value of the genome sequence ▫ Reverse genetic approaches Tuesday, February 25, 14
  • 4. Many initiatives to sequence genomes • 1,000 human genomes ▫ To provide a deep catalog of human genetic variation • Genome 10K -started as an intiative to sequence 10,000 vertebrate genomes. Database currently catalogs specimens from over 16,000 organisms ▫ To understand how complex animal life evolved through changes in DNA and use this knowledge to become better stewards of the planet Tuesday, February 25, 14
  • 5. Letter to Science Announces i5k in 2011 Tuesday, February 25, 14
  • 6. Why sequence 5,000 insect genomes? • 53% of all living species • Maintenance and productivity of natural and agricultural ecosystems • Consume or damage 25% of all agricultural, forestry and livestock production ▫ >$30 Billion in annual loss • Vector plant, animal and human disease ▫ >$50 Billion cost world wide • Just as human and veterinary medicine now rely on personal or animal genome info, revealing info stored in their genomes will transform our ability to manage insects that threaten our health, food supply and economic security • Improve our lives Tuesday, February 25, 14
  • 7. Standard Draft Genome Assemblies • Highly fragmented, even at deep coverage • Scaffolds terminate in repetitive regions • Relatively low N50 values • Example: • 7x Sanger-based Tribolium castaneum genome assembly Tuesday, February 25, 14
  • 8. Tribolium castaneum genomics • Cot analysis ▫ Genome ~200Mb ▫ Long stretches of unique sequence ▫ Low methylation • 9 autosomes, X and Y Jeff Stuart, Purdue Tuesday, February 25, 14
  • 9. Standard Draft  Minimally or unfiltered data, from any number of different sequencing platforms, that are assembled into contiguous strings of bases (AGTC), with no gaps (contigs). This is the minimum standard for submission to public databases. Science Oct 9, 2009 pp236-237 http://compbio.pbworks.com Tuesday, February 25, 14
  • 10. Molecular linkage map used to anchor scaffolds in chromosome builds (ChLG) Low X coverage, no Y, marker density varies Tuesday, February 25, 14
  • 11. Molecular linkage map used to anchor scaffolds in chromosome builds (ChLG) Low X coverage, no Y, marker density varies Tuesday, February 25, 14
  • 12. T. castaneum assembly stats • • • • • • • Number of contigs! ! ! 8,814 Contig N50! ! ! 43,511 Number of scaffolds!! ! 481 Scaffold N50! ! ! 975,455 Total number of chromosomes! 10 (-Y) Unmapped scaffolds!! ! 352 Single contig scaffolds 1835 • (481 + 1830 = 2321 scaffolds total) Tuesday, February 25, 14
  • 13. Scaffold structure of the Tribolium genome assembly ChLG NW NW NW AAJJ 300K Ns Unanchored DS AAJJ DS AAJJ DS AAJJ DS AAJJ Tuesday, February 25, 14 300K Ns
  • 14. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 15. genetic recombination map to the assembly scaffolds, anchoring greater than 90% of the assembled sequence1 (fig1).  To improve this draft assembly, we constructed physical maps of the T. castaneum genome. Using the orientation of scaffolds have been corrected, and scaffolds have been extended by spanning repetitive regions.   Nature 2008 452:949-55. Genome assembly improvements Figure 2 Genome refinements T. castaneum 3.0 Baylor Sanger 7x draft assembly and molecular genetic map T. castaneum 4.0 length (Mb): 160.466 scaffolds: 2321 scaffold N50 (Mb): 0.98 multicontig scaffolds 481 Illumina long distance jumplibraries extended scaffolds into gaps and capturing gaps with Atlas gap-link and gap-filler. length (Mb): 160.862 scaffolds: 2219 scaffold N50 (Mb): 1.16 411 T. castaneum 4.0 and gam-ngs Gam-ngs merged Illumina assembly and T.cas 4.0 extending several unknowns and an LGX scaffold. length (Mb): 160.864 scaffolds: 2219 scaffold N50 (Mb): 1.16 411 T. castaneum 4.0 and gam-ngs plus BioNano maps Sequence scaffolds were aligned to maps with IrysView the alignment was filtered and used to create new scaffolds. Figure Assembly length (Mb): 189.629 scaffolds: 2153 scaffold N50 (Mb): 3.31 341 An independent platform to validate and improve genomes Figure Mis-assemblies Tuesday, February 25, 14 ChLG3 Validate and expa Three scaffolds fr scaffolded with ca
  • 16. How to validate a de novo assembly? • Describe assembly  # contigs, # scaffolds, total bases, N50 lengths  coverage, # ESTs, # orthologs found • But is the assembly accurate? ▫ Compare to BAC sequences ▫ If you have the resources • Need independent (reasonably priced) method Tuesday, February 25, 14
  • 17. Genome maps based on landmarks • BioNanos Genomics ▫ San Diego, California • Imaging ultra-long molecules of DNA • Labeled at restriction sites Tuesday, February 25, 14
  • 18. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 19. Introducing the irys system Tuesday, February 25, 14
  • 20. Labeling schema BspQ1 nicks at GCTCTTCN CGAGAAGN 10 sites /100 Kb Tuesday, February 25, 14
  • 21. Chip Design Tuesday, February 25, 14
  • 22. Samples loaded into 2 flow cells per chip 3 lasers 3 detection channels Detect yoyo 1 in DNA backbone Fluorescent nucleotides at labeled sites Tuesday, February 25, 14
  • 23. DNA molecules entering channels Tuesday, February 25, 14
  • 24. DNA molecules entering channels Tuesday, February 25, 14
  • 25. A long repeat in the Tribolium genome Tuesday, February 25, 14
  • 26. 24 Mapping individual images back to map • hthe Regions flanking repeat are unique Some sites are polymorphic Tuesday, February 25, 14
  • 27. Limitations of the Irys system • Sample prep is very specific • Requires gram amounts of starting material • Bacterial cells, tissue culture cells, eukaryotic nuclei • Less complex tissue is best ▫ Blood ▫ Embryos • Not applicable to transcriptomics projects • contig N50 >30Kb (5 restriction sites) Tuesday, February 25, 14
  • 28. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 29. Assembly images into genomic maps .tiff .bnx .cmap Tuesday, February 25, 14
  • 30. 28 Align BNG maps to in silico maps (.xmap) Tuesday, February 25, 14
  • 31. 29 File formats are similar to generating sequence data... basecall Image files de novo assemble fastq fasta @SRR014849.2 EIXKN4201AKDUH/2 TCAAGTGGTGAACGGCAGAAA + <=B:==B:=<?6=B;<;=B=) Image files call labels Tuesday, February 25, 14 21! 202146.4 1096.2! 8973.8 10.0565! 11.7966 0.0187! 0.0604 sam HWI-ST330_C0NEHACXX: 2:1101:17113:52802#0 ! 69! contig1 ! 2578! 0! * ! =! 2578! 0 ! ATTACGGCCCATGGTTCAGAATAATGACGAA TAGAAATACTAGTACTATATCCCCTAAAAAA! <@CFFFFFHHGFHJHIJJJJJJJJJFJJJFG FHEHIHGHJGIJHIIIJJJJJJJJIJIIJIH! YT:Z:UP >conitg1 TCAAGTGGTGAACGGCAGAAA de novo assemble bnx 0! 1! QX11! QX12! align align cmap #h CMapId ContigLength NumSites SiteID LabelChannel Position StdDev Coverage Occurrence #f int float int int int float float int int 393 225073.2 21 1 1 20.0 0.0 3 3 xmap #h XmapEntryID!QryContigID ! RefcontigID! QryStartPos ! QryEndPos! RefStartPos ! RefEndPos! Orientation ! Confidence! HitEnum #f int !int ! int ! float ! float ! float ! float ! string ! float ! string 1! 94! 1! 444392.7 ! 5839.8! 57024.0 ! 550038.8! -! 28.87 ! 1M1D2M3I4D1M3I2M1I7M1I1M1I9M1I1M1I2M1I3M1D 2M
  • 32. Visualizing an xmap contig id sequence-based scaffold label alignment BioNano contig map coverage Tuesday, February 25, 14
  • 33. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 34. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 35. K-INBRE i5K Github scripts: Irys Scaffolding scripts and manuals written by Jennifer Shelton and Nic Herndon Assembly workflow was developed by Ernest Lam (BioNano) git pull https://github.com/i5K-KINBRE-script-share/Irysscaffolding Tuesday, February 25, 14
  • 36. 34 Assembly pipeline developed with Ernest Lam (BioNano) scripts available at: i5k-KINBRE script share at GitHub: Irys-scaffolding https://github.com/i5K-KINBRE-script-share/Irys-scaffolding Tuesday, February 25, 14
  • 37. Filtering alignments Label density varies throughout the genome so we created scripts to filter in two passes: Pass 1: looks for high confidence score over at least ~30% of the total possible alignment Pass 2: looks for low confidence score over the majority of the total possible alignment (~90%) Pass 1 finds most high quality alignments. Pass 2 finds high-quality low-density alignments. Tuesday, February 25, 14
  • 38. Filtering alignments Super-scaffolded scaffolds are joined in a new reference fasta file. Overlaping scaffolds have a 30bp spacing gap between them If a scaffold aligns more than once only the longest alignment is used If two alignments have the same length only the highest confidence alignment is used Tuesday, February 25, 14
  • 39. Outline • de novo genome assembly and i5K • Improving assemblies with Bionano genome maps ▫ Irys system ▫ File formats ▫ Assembly pipeline ▫ Alignment filtering • Results Tuesday, February 25, 14
  • 40. 38 BNG restriction maps for Tcastaneum • Dual nicked Bsp.QI and BbvCI • 28.6Gb = ~143x coverage of 200Mb Tribolium genome (>150 Kb) • N contigs: 216 • Total Contig Len (Mb):   200.473 • Avg. Contig Len  (Mb):     0.928 • Contig N50       (Mb):    1.350 • Total Ref Len    (Mb):   157.186 • Total Contig Len / Ref Len  : 1.275 Tuesday, February 25, 14
  • 41. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 42. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 43. ChLG X ChLGX had 13 scaffolds. Alignment to BioNano maps captured gaps and validated order for 11 of 13 scaffolds, incorporated 2 unplaced scaffolds and identified a potential misplaced scaffold (scaffold 2 aligns with another linkage group). Tuesday, February 25, 14
  • 44. ChLG 7 Alignment to BioNano maps captured gaps and validated order for 13 of 15 scaffolds. Scaffold 14 needs to be reversed in the super-scaffold. Tuesday, February 25, 14
  • 45. ChLG 7 Alignment to BioNano maps captured gaps and validated order for 13 of 15 scaffolds. Scaffold 14 needs to be reversed in the super-scaffold. Tuesday, February 25, 14
  • 46. Additional chromosome linkage groups. Tuesday, February 25, 14
  • 47. Additional chromosome linkage groups. ChLG 3 Tuesday, February 25, 14
  • 48. Additional chromosome linkage groups. ChLG 3 ChLG 9 Tuesday, February 25, 14
  • 49. Additional chromosome linkage groups. ChLG 3 ChLG 9 ChLG 2 Tuesday, February 25, 14
  • 50. 42 what does it cost? • 100-500Mb genome <$5,000 ▫ 70-100x coverage • 1Gb genome <$8,000 ▫ 70-100x coverage • completely dependent on homogeneity of starting material • assembly and analysis software is included in price Tuesday, February 25, 14
  • 51. 43 Summary • • • • • • • • • Standard Draft Genomes are highly fragmented BNG provides independent platform Whole genome restriction maps Validate assembly Extend scaffolds/Size Gaps Identify structural variants Identify haplotypes Comprehensive view of repetitive DNA (HORs) A validated genome assembly improves downstream analyses Tuesday, February 25, 14
  • 52. Thanks to: • Michelle Gordon ▫ Research Assistant: optimizing sample preps • Jennifer Shelton ▫ Biologist turned Bioinformaticist • Nic Herndon ▫ Computer scientist turned Bioinformaticist • BioNano Genomics ▫ Ernest Lam ▫ Weiping Wang Tuesday, February 25, 14