The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the causal agent for the citrus greening or Huanglongbing disease which threatens citrus industry worldwide. This vector is the primary target of approaches to stop the transmission of the pathogen. Accurate structural and functional annotation of the psyllid’s gene models and understanding its interactions with the pathogenic bacterium, CLas, is required for precise targeting using molecular methods such as RNAi. We opted for manual curation of gene families in the draft genome of D. citri (Diaci v1.1, contig N50 34.4Kb) that have key functional roles in D. citri biology and pathology. The community effort resulted in Official Gene Set v1.0 with more than 500 manually curated gene models across developmental, RNAi regulatory, and immune-related pathways.
Single copy marker analysis of the current genome shows a significant proportion of 3,350 markers conserved in Hemipterans to be missing (25%) with only 74% present in full-length copies. The manual genome annotation also identified a number of misassemblies and missing genes in the current genome. This is, in-part, due to the complexity introduced when assembling a heterogeneous sample containing DNA from multiple psyllids and exacerbated by the use of short reads. This challenge is common with insect genomes due to the size of individuals. To improve quality of genome assembly, we generated 36.2Gb of Pacbio long reads with a coverage of 80X for the 450Mb psyllid genome. The Canu assembler followed by Dovetail Chicago-based scaffolding was used to create an improved assembly (Diaci v2.0) with a contig N50 of 758.7kb and 1906 contigs. The assembly was polished with Pacbio and Illumina paired-end reads to remove indel and SNP errors. We are employing Dovetail Chicago and 10X Illumina libraries generated from a single psyllid in conjunction with Bionano optical maps to achieve long-range scaffolding of the genome. We have also generated full-length cDNA transcripts from diseased and healthy tissue from multiple life stages with the Pacbio IsoSeq technology. This will be the first time all these methods have been applied to resolve a complex insect genome from a highly heterogeneous sample. The new assembly will be available on https://citrusgreening.org/ which is our portal for all omics resources for the citrusgreening disease. We are continuing with the manual curation effort using the improved genome. We will also present how the improved genome and annotation is contributing to the development of molecular interdiction methods to disrupt the vectoring ability of D. citri.
Deciphering the genome of Diaphorina citri to develop solutions for the citrus greening disease
1. www.citrusgreening.org
Deciphering the genome of Diaphorina citri to
develop solutions for the citrus greening disease
Surya Saha
Boyce Thompson Institute, Ithaca, NY
ss2489@cornell.edu @SahaSurya
7TH ANNUAL CORNELL ENTOMOLOGY SYMPOSIUM
January 19th, 2018
Geneva, NY
2. www.citrusgreening.org
Acknowledgements
Mueller Lab
Kansas State University
Sue Brown
Cornell University/BTI
Michelle (Cilia) Heck
USDA/ARS
Wayne Hunter
Robert Shatters
University of California, Davis
Carolyn Slupsky
Indian River State College
Tom D’elia
Mirella Flores Prashant Hosmani Stephanie Hoyt
3. www.citrusgreening.org
Citrus Greening: Huanglongbing
• Most significant disease of citrus worldwide
• More than $4.5 billion in lost citrus production and more than 8,200 lost jobs (2006/07 to 2010/11)
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Annie Kruse
2017
5. Omics resources and databases are required for
identification of targets for interdiction
5
Genome Annotation
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
6. www.citrusgreening.org
6
• Genome Diaci v1.1
• Official gene set v1.0
• Immune pathway
• RNAi pathway
• P450 gene family
• 530 manually curated models
• ~20,000 NCBI predicted models
• MCOT transcriptome v1.1
R. prolixus (red), A. pisum (green) and D. citri (blue)
Four clans of P450s, CYP2 (orange), CYP3 (green), CYP4 (red)
and mito (blue) clan are shown in the phylogenetic tree.
7. www.citrusgreening.org
• 18 students involved
• >250 gene models
• >30 gene families
• 13 gene reports for publication
Annotation with Web Apollo
Weekly IRSC Annotation Meetings
Annotation by undergraduate students
Slide: Tom D’elia
9. www.citrusgreening.org
Metabolic Pathway Database Construction
Genes
e.g. C. sinensis, D. citri
PathoLogic
Software
Reference Pathway
Database (MetaCyc)
Reactions
Pathways
Compounds
Gene products Genes
Pathway/Genome Database
(CitrusCyc, DiaphorinaCyc)
Source: Peter Karp (SRI)
•Predicts metabolic pathways
•Predicts which genes code for missing enzymes in metabolic pathways
•Infers transport reactions from transporter names
10. www.citrusgreening.org
DiaphorinaCyc cellular overview with RNAseq data
Cellular Overview of Diaphorina citri overlaid with RNAseq expression counts
Membrane proteins
Secretory
proteins
Membrane proteins
Kruse et al. 2017
http://ptools.citrusgreening.org/overviewsWeb/celOv.shtml
Pathways: 185
Enzymes: 2855
Transport Reactions: 24
Proteins: 12548
Transporters: 77
Compounds: 1180
11. www.citrusgreening.org
Psyllid Expression Network
11
Hosts: C. medica and C. sinensis
Treatment: Percoll gradient fractionation
Stages: Nymph and adult
Conditions: CLas+ and healthy
Colored by
level of
expression
Genes correlated
with XP_00847050
ATP synthase
subunit beta
http://pen.sgn.cornell.edu
12. www.citrusgreening.org
v1.91 v2.0
REFERENCE
v2.0
ALTERNATE
Number of
contigs
3,681 1,906 1,751
Total bases 596 Mb 498 Mb 79.1 Mb
Longest 4.2 Mb 4.2 Mb 760.6 Kb
Shortest 1.5 Kb 6 Kb 1.5 Kb
Average
length
162 Kb 261.7 Kb 45.2 Kb
Contig N50 620 Kb 749 Kb 75.1 Kb
Ns 5.1 Mb 4.5 Mb 467 Kb
500ng input DNA from single male psyllid
Duplicated contigs added to alternate assembly
Contiguous assembly with longer contigs
Multiple individuals in DNA sample
Genome Diaci1.1
Contigs 161,988
Total bases 485 Mb
Longest 1 Mb
Shortest 201bp
Ns 19.3 Mb
Scaffold N50: 109,898 bp
Contig N50: 34,407bp
Illumina assembly
Pacbio assembly
13. Gene isoform sequencing (Iso-Seq)
Accurate gene models are necessary for
targeting assays
• Majority of genes are alternatively spliced to
produce multiple transcript isoforms.
• Iso-Seq generates full-length cDNA sequences
(full-length transcripts and gene isoforms).
Current MCOT (de novo and genome-based)
transcriptome is useful but fragmented
Korf 2013
14. www.citrusgreening.org
Iso-Seq transcriptome
Mapped to D. citri v2.0
Total isoforms: 196,419
Isoseq provides a comprehensive (de novo and genome-based)
transcriptome with full-length transcripts and a range of isoforms
Counts
Number of genes 14,768
(30,562 in MCOT)
Number of
isoforms
52,223
Average number
of isoforms/gene
3.53
N50 2.8 Kb
Longest 9.7 Kb
Shortest 100 bp
88%
12%
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Mapped UnMapped
Isoforms
15. Improved genome and annotation will expedite
identification of targets for interdiction
15
Genome
Pacbio v2.0
Annotation
Isoseq
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
16. www.citrusgreening.org
Thank you!!
Improved Asian citrus psyllid genome (v2.0)
New Iso-Seq transcriptome
Metabolic pathway database
Psyllid Expression Network (PEN)
Webinar launching the new data sets and related tools on Citrusgreening.org
March 5, 2018
Details coming soon!!
@Citrusgreening
https://www.facebook.com/citrusgreening
@SahaSurya