3. Roadmap
1. Functional annotation tools for invertebrates
2. Example: Citrus greening
3. Asian citrus psyllid (Diaphorina citri)
• Genome assembly
• Microbiome and interaction with pathogen
4. Structural annotation of genes
5. Functional annotation
• Gene Ontology (GO)
• Pathways
6. Example: Functional modeling of Infected vs Uninfected D. citri
7. Upcoming resources and annotation plans
4. How do we move from
sequence to biology?
• ARS-UA joint project to develop
common workflows and
practices for functionally
annotating invertebrate
genomes.
• Training events to support use of
these workflows.
Annotation
5. 1
2 3
4
5
Annotation
1. Functional annotation tools
1. Identify proteins
2. Transfer function based upon sequence homology
3. Assign function based upon functional motifs/domains
4. Combine GO, QC, formatting for use
5. Pathway information
6. 1. Identify proteins
2. Transfer function based upon sequence homology
3. Assign function based upon functional motifs/domains
4. Combine GO, QC, formatting for use
5. Pathway information
1
2 3
4
5
Annotation
Current Size of EXP Only Dbs
SwissProt 72,337
TrEMBL 50,258
Invertebrate 20,741
Arthropod 12,081
Insecta 11,886
Nematode 4,941
7. So What Does this Process Get Us?
Motif/domain information for comparative & evolutionary studies
• Evolution of gene families
• Targets for genome annotation
GO information for GO enrichment
• Support for functional genomics
• GO enrichment tools that allow you import your own GO annotations
• Targets for genome annotation
Pathways information for functional enrichment
• Identification of arthropod specific pathways
• KOBAS tool for pathways enrichment
8. 2. Example: Citrus Greening (Huanglongbing)
• Most significant disease of citrus worldwide. 100% infection in Florida now
• More than $5 billion in lost citrus production and more than 10,000 lost jobs
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Heck Lab September 2017, UC Riverside Extension
www.citrusgreening.org
10. 500ng input DNA from single male psyllid
Duplicated contigs added to alternate assembly
Error correction
• DNA sequencing data
• RNA sequencing data
Duplication removal with Redundans
Scaffolding with Hi-C
3. Asian citrus psyllid genome (Diaphorina citri)
v1.1 v2.0
REFERENCE
v3.0
REFERENCE
Number of
contigs
161,988 1,906 13 + unplaced
Total bases 485 Mb 498 Mb 474 Mb
Longest 1 Mb 4.2 Mb 50.3 Mb
Contig N50 34.4 Kb 749 Kb 40.5Mb
Ns 19.3 Mb 4.5 Mb 13.4Mb
Complete
BUSCO (%)
65.9 75.9 88.3
Repeat (%) 26.37 31.9 30.2
www.citrusgreening.org
11. CLas induces mitochondrial dysfunction in
the gut
Kruse et al. 2017, Mann et al. 2018
MitoSOX staining
CLas +
CLas -
www.citrusgreening.org
12. CLas and Wolbachia localize in the
same ACP gut cells
DAPI nuclear stain CLas
(Pathogen)
Wolbachia
(Endosymbiont)
Merged
60X magnification
Kruse et. al. PLoS One 2017www.citrusgreening.org
13. First endosymbiont genomes from Psyllid in FL
Wolbachia Profftella Carsonella
10 scaffolds 1 chromosome
and 1 plasmid
1 chromosome
Largest 923 Kb 471 Kb -
Smallest 19 Kb 4.7 Kb -
Total Size 2 Mb 475.7 Kb 150 Kb
Stephanie Hoyt
Mueller lab
Wolbachia Profftella Carsonella
Number of reference genomes 8 2 9
Total number of conserved orthogroups 559 307 116
Number of conserved orthogroups in our assembly 557 307 106
Number of shared orthogroups (<50% genomes) 167 - 12
Orthology Analysis
www.citrusgreening.org
14. Wolbachia Strains
Scaffolds were removed from the Wolbachia
assembly resulting in a large decrease in
duplication, but a small decrease in conserved
orthogroup coverage
Based on these results we hypothesize
that there are two strains of Wolbachia
present in this sample:
• Strain 1: Scaffolds 1 and 2 cover
534/559 conserved orthogroups
• Strain 2: Scaffolds 1 and 3 cover
503/559 conserved orthogroups Comparing genomic sequences of our Wolbachia strain 2 and
reference genomes to our Wolbachia strain 1
www.citrusgreening.org
15. High quality annotation and databases are required to
identify targets for interdiction
15
Genome Annotation
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
25. How do we measure GO Quality?
BREADTH: all gene products should have GO
annotation (for CC, MF, BP).
DEPTH: function should be as detailed as possible. EVIDENCE: Published experiments provide direct
evidence of function in that species.
Buza et al 2008. Gene Ontology annotation quality analysis in model eukaryotes. Nucleic acids research, 36(2), e12-e12.
26. Adding Details to InterProScan GO: GOanna
0
20
40
60
80
100
120
140
160
180
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
InterPro GOanna Combined
Annotation Type
GO Annotation Quality
No. GO annotations proteins annotated Av Quality Score
Interpro & GOanna are complementary approaches.
InterProScan provides "breadth" (some GO annotation for most proteins)
GOanna provides "depth" (more detailed GO terms for some proteins)
31. Comparison of Infected and Uninfected Samples
Infected samples: 22
Uninfected samples: 35
79% genes have > 1 read/million in at least
22 libs
Lot of variability across samples!!
InfectedUninfected
32. Differential Expression Results
16,879 genes with nonzero total read
count with adjusted p-value < 0.05
LFC > 0 (up) : 3162, 19%
LFC < 0 (down): 3627, 21%
Gene-wise estimates (black) and fitted
values (red)
Blue circles are genes with high dispersion
that are outliers
33. topGO Enriched GO Biological Processes
All GO terms with p-val 0.05
Deeper shades of red indicate smaller p-values
Larger circles represent higher proportion of proteins
Genes
GO BP
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 7,130 1,384 61
InterProScan
+ GOanna
11,490 7,673 2,022 58
34. topGO Enriched GO Molecular Functions
All GO terms with p-val 0.05
Deeper shares of red indicate smaller p-values
Larger circles represent higher proportion of proteins
Genes
GO MF
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 3,280 270 6
InterProScan
+ GOanna
11,490 9,365 713 16
35. DEGs associated with the cytoskeleton were
upregulated in the CLas-infected midguts
36. topGO Enriched GO Cellular Component
Genes
GO CC
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 536 111 0
InterProScan
+ GOanna
11,490 4,498 447 4All GO terms with p-val 0.05
Deeper shares of red indicate smaller p-values
Larger circles represent higher proportion of proteins
38. “Localized mitochondrial dysfunction in the gut when
insects are exposed to CLas-infected trees”
Nuclear swelling and
fragmentation of the
heterochromatin
40. “D. citri might inhibit the expression of endocytosis-
related genes in the midgut to prevent the further
transmission of Clas”
41. Pathway
Input
number
Background
number
P-Value
Gene Expression 281 1303 3.78E-07
Endocytosis 53 221 0.004147464
Cell Cycle 84 378 0.002532175
Nonsense-Mediated Decay (NMD)
52 195
0.000666531
siRNA biogenesis 9 18 0.007008944
One carbon pool by folate 11 25 0.00629025
Pathway
Input
number
Background
number
P-Value
Fatty Acyl-CoA Biosynthesis 31 92 0.00357436
ABC-family proteins mediated
transport
24 70 0.008101975
COPI-mediated anterograde
transport
33 106 0.007223449
Cellular response to hypoxia 12 25 0.008017566
Formation of ATP by chemiosmotic
coupling
12 23 0.004869061
Regulation of cytoskeletal
remodeling and cell spreading by IPP
complex components
6 6 0.005526965
Enriched Pathways: Up & Down Regulated Genes
Pathways enriched from Up-regulated genes Pathways enriched from Down-regulated genes
42. 6. Summary of Functional Modeling
Tools for YOU!!
• Functional modeling tools to link
genomics back to biological
context
• Can now provide GO and pathway
information for functional
genomics
• InterPro motif analysis may help
guide manual annotations &
supports comparative analyses
• Tools available via AgBase &
Docker
Analysis of data sets
• Citrus greening vector (D. citri) now
has GO & pathways information
available
• GO and pathways analyses are
complementary (shared insights)
• During infection, vector
transcription and translation
responses are tissue-specific
• Lipid synthesis is down regulated
and protein transport is disrupted
• Strong links to mitochondrial
dysfunction
44. 7. Future Plans & Acknowledgements
• Continued testing and deployment of the workflows
• When to use InterProScan and when to add GOanna GO?
• Prioritizing genes for manual curation
• Identification of missing or erroneous gene families
• Optimizing pathways information
• What format will make this most useful?
• How can we improve pathway reconstruction?
• Training sessions
• Feedback on tools and documentation
• Making functional data from this project available
• i5k, NAL, AgBase and Citrusgreening.org
• Docker and Singularity based pipeline
This work was supported by funding from
the USDA Agricultural Research Service