Advertisement

Exploiting long read sequencing technology to build a substantially improved pig reference genome sequence

Genome Reference Consortium
Sep. 30, 2016
Advertisement

More Related Content

Slideshows for you(20)

Advertisement

Similar to Exploiting long read sequencing technology to build a substantially improved pig reference genome sequence(20)

Advertisement

Exploiting long read sequencing technology to build a substantially improved pig reference genome sequence

  1. Exploiting long read sequencing technology to build a substantially improved pig reference genome sequence Alan Archibald The Roslin Institute and R(D)SVS University of Edinburgh
  2. Draft reference pig genome sequence Swine Genome Sequencing Consortium
  3. Hybrid Shotgun Sequencing Strategy Whole- genome shotgun reads Combine overlapping whole-genome and BAC-derived reads Assemble clone sequences to represent chromosomes and annotate using Ensembl automated pipeline BAC shotgun reads Minimal set of overlapping BACs selected from physical map Sequence assembly
  4. Sscrofa10.2 – chromosome assigned scaffolds only Length (bp) Chromosomes 1-18, X, Y Contigs N50 80,720 Contigs N90 13,487 Average contig length 31,604 Largest contig length 1,598,650 Scaffold N50 637,332 Scaffold N90 189,449 Average scaffold length 436,176 Largest scaffold length 3,862,550
  5. BAC Contigs / Fragments (paired) end sequences of subclone libraries 768 subclones / BAC Av read: 707 bp phrap create fragment chains Submission to EMBL/Genbank A B C D E F G GA C B E F D NNNN NNN NNN NNN NNN NNNN fragment chain 1 fragment chain 2 A B C D E F G
  6. Limitations of Sscrofa10.2 • Missing coverage ~10% – Poorly captured in unplaced scaffolds • Local scaffolding issues – Order & orientation of sequence contigs within BACs not resolved unambiguously – No BAC clone sequence assigned to > 1 scaffold • Unresolved redundancy from overlapping BAC clones • Project memory loss – e.g. unplaced FPC contigs listed at end of q-arm
  7. http://geval.sanger.ac.uk/PGP_pig_10_2/Info/Index
  8. Sscrofa10.2- QC • Illumina PE reads from same pig mapped to Sscrofa10.2 • Looked for indicators of structural variation – including high/low coverage, incorrect orientation and abnormal insert sizes. • Looked for homozygous variants
  9. Sscrofa10.2-Chr 1
  10. De novo genome assemblies using Pacific Biosystems long read technology TJTabaso (Duroc 2-14) MARC1423004 Duroc sow Duroc/Landrace/Yorkshire barrow
  11. PacBio – draft WGS assembly • Duroc 2-14 (same pig as most of Sscrofa10.2) • 65x genome coverage • Pacific Biosystems P6 chemistry • Length cut-off for reads for assembly 13 kbp • Coverage of corrected reads for assembly 19x
  12. Contig QC
  13. Variants • Homozygous SNPs: – Sscrofa10.2: 415,056 – Pacbio contigs: 34,545 • Homozygous indels: – Sscrofa10.2: 168,037 – Pacbio contigs: 1,729,510
  14. Scaffolding • Scaffold by mapping contigs to Sscrofa10.2 – using Nucmer – Assumme Sscrofa10.2 gross structure is correct • Radiation Hybrid and Linkage maps, 60K SNPs • FPC physical map • 2.36 Gb ungapped length • 434 contigs
  15. Chromosome 6
  16. Chromosome 6
  17. Gap Filling • Gap filling was done using PBJelly • Further gaps filled using large finished BACs from Sscrofa10.2 assembly – 7 had large sequenced BAC contigs crossing them – We sequenced 5 more • Plus manual placing of some fiddly contigs • 181 gaps remaining • N50 increased to 35.8Mb #35MbCtgClub
  18. Targeted gap closure CH242-323K10
  19. Targeted gap closure CH242-284F8
  20. Targeted gap closure CH242-284F8
  21. Sequencing Additional BACs • 5 BACs with ends that appear to cross gaps in the assembly – Sequenced using the MinION and were assembled into individual contigs using Canu – Polished using Pilon • Mapping of the assembled BAC contigs to the scaffolds showed they could be placed in their expected regions • Potential to fill 129 more gaps in this way #porecamp
  22. Error Correction • Arrow (succeeds Quiver) – Using PacBio reads to error correct assembled sequence – Reduced homozygous SNPs • from 34,545 to 27,018 – Reduced homozygous indels • 1,729,510 to 1,036,696 • Pilon (currently running) – Using Illumina mate pair and Illumina paired ends libraries – Can detect and correct SNPs and indels, structural abnormalities, plus potential for gap filling – Expecting to reduce the remaining false variants
  23. Evaluate • Order and Orientation wrt RH map • Order, orientation, distance between paired ends – CH242 BAC ends – Fosmid ends – Illumina mate pairs (5-7 Kbp, 9-11 Kbp) – Illumina paired ends (500-660 bp) • Gene models
  24. BAC end sequence alignments – orientation & insert size
  25. BBS4
  26. IGF2
  27. CFTR – ST7
  28. ST7
  29. Sscrofa11 - a new pig reference genome sequence worthy of adoption by the GRC Alan Archibald The Roslin Institute and R(D)SVS University of Edinburgh
  30. Adding pig genome to GRC  High quality, highly contiguous genome  Resources for gap closure - Isogenic BAC library CHORI242, ends sequenced - Isogenic fosmid library WTSI_1005, ends sequenced  User communities, incl. SGSC, FAANG  Funding - BBSRC strategic funding (The Roslin Institute) - BBSRC BBR Ensembl - COST Action CA15112 (FAANG-Europe)
  31. Acknowledgements • Roslin Institute – Amanda Warr – Mick Watson – David Hume – Heather Finlayson – Christine Burkard – Lel Eory – Richard Talbot – John Hickey • PacBio – Richard Hall – Jason Chin – Harold Lee – Regina Lam – Kirsti Kim – Jim Burrows alan.archibald@roslin.ed.ac.uk @AlanArchibald51 • USDA – Tim Smith – Derek Bickhart – Ben Rosen – Steve Schroeder • gEVAL – Will Chow – Kerstin Howe • Other – Sergey Koren – Chris Warkup – Swine Genome Sequencing Consortium MARC BARC @FAANGEurope
Advertisement