Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Sl4.0 and ITAG4.0

281 views

Published on

Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0

Published in: Science
  • Be the first to comment

  • Be the first to like this

Sl4.0 and ITAG4.0

  1. 1. Solanum lycopersicum Heinz 1706 genome assembly and annotation SL4.0 and ITAG4.0 Sol Genomics Network https://solgenomics.net/
  2. 2. SL4.0 assembly ● 80X Pacbio coverage with RSII and Sequel (13kb read N50) ● Canu assembly (N50 5.5 Mb) ● Hi-C scaffolding (12 chromosomes and unplaced contigs) ● Corrected with Illumina DNAseq (coverage 60x) ● Filtered for mitochondrial and chloroplast contigs ● Validated with Bionano optical maps and 10X linked reads
  3. 3. Comparison with the previous assemblies Genome Assembly versions SL4.0 SL3.0 SL2.5 Assembly Size (bp) 782,520,133 828,076,956 823,944,041 Non-N bases 782,475,302 746,357,470 737,636,348 N’s (bp) 44,831 81,719,486 86,307,693 Chr 00 / unplaced contig size (bp) 9,643,350 20,852,292 21,805,821 Number of Chr 00 contigs 152 3,141 4,410 Repeat content (RepeatModeler/RepeatMasker) 64.19% 56.39% 56.34% Repeat content (REPET) 71.77% 61.55% 60.94% Assembly completeness estimation based on kmer's 99.24% 98.96% 98.83%
  4. 4. SL3.0 vs SL4.0 Genome assembly co-linearity
  5. 5. Input data for genome annotation - Full-length cDNA sequenced using PacBio IsoSeq (Breaker and Mature green fruit stages) - RNAseq Illumina data from >1,300 libraries with >14 billion reads - Disease resistance data (Martin and Jones labs) - 3’ and 5’ UTR enriched data (Giovannoni, Aharoni and Sinha labs) - Public data from NCBI SRA - NCBI EST sequences (~300 K) - Full-length cDNA sequences (~13 K) from Micro-Tom (Aoki et. al., 2010)
  6. 6. Annotation of protein-coding gene models ITAG4.0 ITAG2.4 Number of protein-coding genes 34,075 34,725 Average transcript length 1,303 1,209 Average number of exons per gene 4.74 4.61 Fraction of genes with 5' UTR 0.49 0.34 Fraction of genes with 3' UTR 0.58 0.41 Long non-coding RNA in ITAG4.0 - 5,874 with 6,694 alternately spliced isoforms
  7. 7. Annotation Edit Distance (AED) Annotation Edit Distance (AED) provides a means to evaluate quality of annotations given the evidence set. AED cumulative plot shows improvements in the ITAG4.0 compared to ITAG2.4.
  8. 8. Novel protein coding genes in ITAG4.0 Novel genes in ITAG4.0 are enriched in stress response genes. GO-terms enriched in novel genes are shown as fold enriched in minus log10 of their corresponding P-values.
  9. 9. Thank you! Submit your annotation corrections using Tomato Apollo annotation editor - contact SGN for account https://solgenomics.net/contact/form

×