The tomato reference genome is one of the most widely used genomic resources in the Solanaceae as well as the wider plant research community. We frequently receive questions from the community regarding the assembly versions. This session will explain the changes in the current version of the tomato genome (SL2.50). The current tomato genome build contains numerous inter-contig gaps (median 931bp, mean 1869bp) and inter-scaffold gaps (median 210Kbp, mean 525Kbp). Updates will be provided regarding the forthcoming tomato genome build (SL3.0) that will include finished BACs (HTGS phase 3) for closing the gaps.
1. Tomato Genome SL2.50 and
Beyond…
Surya Saha, Jeremy Edwards and Lukas Mueller
Sol Genomics Network (SGN)
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu @SahaSurya
Slides: http://bit.ly/PAGbld230
https://fanart.tv/movie/196/back-to-the-future-part-iii/
4. Genome Assembly @NCBI
Contigs
• Components
Tiling Path file
(TPF)
• Accession numbers
• Can have nested
components
Accession
Golden Path files
(AGP)
• Scaffold IDs
• Orientation
• Chromosome from
contig AGP
• Chromosome from
scaffold AGP
• Scaffold from
contig AGP
NCBI
5. SGN Workshop, PAG 2015
Jeremy Edwards
https://github.com/solgenomics/Bio-GenomeUpdate
FISH
• Order
• Orientation
• Gap sizes
Tiling Path file
(TPF)
Accession
Golden Path files
(AGP)
NCBI
Gap extension
Scaffold flip
6. SGN Workshop, PAG 2015
Jeremy Edwards
https://github.com/solgenomics/Bio-GenomeUpdate
SL2.40 Annotation
• SL2.40 AGP
• SL2.50 AGP
• SL2.40 GFF3
SL2.50 Annotation
• SL2.50 GFF3
• Validated via Fasta
Errors corrected
• Start/end coordinates in different scaffolds
• Start > end coordinates for UTRs
• Start or end coordinates in gap region
• Dropped Solyc03g053140.1 and Solyc12g032910.1
20. Future Work
• Manually examine assembled BAC contigs with < 99% identity
• Evaluate HTGS phase 2 BACs
• Use PCR walking to close gaps
• Create TPF files for SL3.0
• Annotate SL3.0 and lift over annotations from SL2.50
SGN Workshop, PAG 2015