Updates on the ACP v3 genome and annotation from USDA NIFA project meeting

www.citrusgreening.org
Objective 1
Data Integration and Analysis
Genome, annotation and transcriptome
Fifth Annual Meeting
Ft. Pierce, FL
Prashant Hosmani, Mirella Flores, Lukas Mueller and Surya Saha
Boyce Thompson Institute

Psyllid genomics timeline2014
• Psyllid v1.1 genome
2015
2016
• MCOT de novo transcriptome
• Psyllid annotation OGSv1.0
• Psyllid PacBio genome v1.9
2017
2019
• IsoSeq de novo transcriptome
2018
• Psyllid PacBio genome 2.0
• Carsonella and Profftella
genomes from FL
• Psyllid PacBio genome v3.0
• Wolbachia strains from FL
Manual annotation

https://www.biorxiv.org/content/10.1101/869685v1
17 students among 30 authors

Pacbio genome assembly

500ng input DNA for Dovetail Chicago from single male psyllid
Duplicated contigs added to alternate assembly
Asian citrus psyllid(ACP) reference genome
v1.1 v2.0
REFERENCE
v3.0
REFERENCE
Number of
contigs
161,988 1,906 13 + unplaced
Total bases 485 Mb 498 Mb 474 Mb
Longest 1 Mb 4.2 Mb 50.3 Mb
Contig N50 34.4 Kb 749 Kb 40.5Mb
Ns 19.3 Mb 4.5 Mb 13.4Mb
Complete
BUSCO (%)
65.9 75.9 88.3
Repeat (%) 26.37 31.9 30.2
Chicago and Hi-C

Microbial contamination on Chr09
• Removing first 2.3Mb
• 4-5000X depth of coverage
• Coverage of >90% of
endosymbiont genome
Carsonella
Chr09

Got assembly. Now what?
Comparative genomics
AgriVectors.org

Power of comparative genomics
Species Common name Genome size Lead
Cacopsylla pyricola Pear psylla 480-485Mb Rodney Cooper
Leuronota fagarae Lime psyllid 465-483Mb Jawwad Qureshi
Bactericera cockerelli Potato psyllid 421-426Mb Daisy Fu
Pachypyslla venusta Hackberry petiole gall
psyllid
TBD Nancy Moran
Lygus lineolaris Tarnished plant bug TBD OP Perera
Geocoris pallens Western big-eyed bug ~1Gb Nick Booster (Rosenheim lab)
Circulifer tenellus Beet leafhopper ~1Gb Bob Gilbertson
Macrosteles quadrilineatus Aster leafhopper TBD Astri Wayadande
Graminella nigrifrons Black-faced leafhopper TBD Astri Wayadande
Dalbulus maidis Maize leafhopper TBD Astri Wayadande

Psyllid genome annotation and
manual curation using Apollo
Prashant Hosmani
Mueller Lab

Gene prediction overview for OGS v3.0
• RepeatModeler
• Protein masking
• RepeatMasker
Repeat
Masking
• RNA-seq HISAT &
StringTie
• Iso-Seq - GMAP &
Cupcake ToFU
Transcriptome • Portcullis
junctions
• StringTie
• Iso-Seq
Mikado
• Mikado Gene
Loci
• Portcullis
junctions
Maker
• AHRD
• Interproscan
Functional
annotation
Augustus
GeneMark

Diaphorina citri Apollo annotation editor
Collaboratory system
● Indian River State College (IRSC)
● Kansas State University (KSU)
● University of Cincinnati (UC)
● BTI / Cornell University
More than 40 registered annotators
Login to Apollo at
https://citrusgreening.org/

Pathway based manual curation
• Development
• Segmentation
• Wnt and other signaling pathways
• Hox genes
• Immune response
• Metabolic and cellular functions
• Carbohydrate metabolism
• Chitin metabolism
• vATPase
• Chromatin remodeling
• Environmental/Sensory
• Circadian rhythm
• Phototransduction
• Reproduction
• 811 curated genes in OGSv3
• 132 updated models from OGSv1
(genome v1.1)

High-quality manually curated genes
Annotation set OGS1.0 OGS2.0 OGS3.0 Curated
No. of genes 19,311 20,793 19,049 811
No. of transcripts 20,966 25,292 21,345 916
No. of Exons Per transcript 5.42 7.06 7.29 7.87
Avg. transcript length (bp) 1,317 1,944 2,034 2,503
Avg. exon length (bp) 243 275 279 318
non-canonical splice sites 6.05% 3.13% 2.47% 1.91%
OGS: Official Gene Set

Completeness using BUSCO single copy
markers
BUSCO Hemiptera
complete
Duplicated Fragmented Missing
OGS1 74.5 13.0 0.3 25.2
OGS2 81.6 37.3 0.2 18.2
OGS3 80.2 29.4 0.1 19.7

Pacbio Isoseq transcriptome
Mirella Flores
Mueller Lab

De novo transcriptome input datasets
RNA-Seq
• Gut Clas+ and Clas- (Heck lab)
• Male, female, antenna and
terminal abdomen (Slupsky lab)
• Salivary glands (Heck Lab)
Iso-Seq
• Adult Clas +/CLas-
• Nymph Clas +/-

Workflow
Total transcripts: 60,261
Iso-Seq transcripts
and RNA-seq
transcripts
clustering
Remove
contamination
(endosymbionts,
archaea , viral,
bacteria)
RNA-Seq
De novo
transcriptome
assembly
Genome based
transcripts
filtering
Pfam domains
coding
transcripts
filtering
PacBio
Iso-Seq
pipeline.
Illumina data
correction
Remove
contamination
Filtering by
insecta trembl
set
2,197,769
196k
DcDTr (RNA-Seq transcripts): 41,457
DcDTi (Iso-Seq transcripts): 18,804

De novo transcriptome statistics
Genes 40,637
Transcripts 60,261
Average length 1,736.1
Smallest 108
Largest 35,954
N50 3,657bp
Complete 79.9%
single-copy 53.2%
duplicated 26.7
Fragmented 0.1%
Missing 20%
BUSCO
Hemiptera dataset
Number of BUSCOs: 3350

Future work

Genome and annotation paper 2019-2020
• New v3.0 assembly
• First hemipteran chromosomal length genome assembly
• Curated genes from previous v1.1 and v2.0 assemblies
• Meta-paper with 11-15 sub-papers led by students for each pathway
• Isoseq transcriptome

Updates on the ACP v3 genome and annotation from USDA NIFA project meeting

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Updates on the ACP v3 genome and annotation from USDA NIFA project meeting

Similar to Updates on the ACP v3 genome and annotation from USDA NIFA project meeting (20)

More from Surya Saha

More from Surya Saha (20)

Recently uploaded

Recently uploaded (20)

Updates on the ACP v3 genome and annotation from USDA NIFA project meeting