SlideShare a Scribd company logo
Functional annotation of invertebrate
genomes
Surya Saha, Fiona McCarthy, Amanda Cooksey,
Anna K. Childers & Monica Poelchau
suryasaha@arizona.edu | @SahaSurya
August 31st, 2020
Acknowledgements
Fiona M McCarthy Monica Poelchau Chris Childers
Mueller Lab, Boyce Thompson Institute
Roadmap
1. Functional annotation tools for invertebrates
2. Example: Citrus greening
3. Asian citrus psyllid (Diaphorina citri)
• Genome assembly
• Microbiome and interaction with pathogen
4. Structural annotation of genes
5. Functional annotation
• Gene Ontology (GO)
• Pathways
6. Example: Functional modeling of Infected vs Uninfected D. citri
7. Upcoming resources and annotation plans
How do we move from
sequence to biology?
• ARS-UA joint project to develop
common workflows and
practices for functionally
annotating invertebrate
genomes.
• Training events to support use of
these workflows.
Annotation
1
2 3
4
5
Annotation
1. Functional annotation tools
1. Identify proteins
2. Transfer function based upon sequence homology
3. Assign function based upon functional motifs/domains
4. Combine GO, QC, formatting for use
5. Pathway information
1. Identify proteins
2. Transfer function based upon sequence homology
3. Assign function based upon functional motifs/domains
4. Combine GO, QC, formatting for use
5. Pathway information
1
2 3
4
5
Annotation
Current Size of EXP Only Dbs
SwissProt 72,337
TrEMBL 50,258
Invertebrate 20,741
Arthropod 12,081
Insecta 11,886
Nematode 4,941
So What Does this Process Get Us?
Motif/domain information for comparative & evolutionary studies
• Evolution of gene families
• Targets for genome annotation
GO information for GO enrichment
• Support for functional genomics
• GO enrichment tools that allow you import your own GO annotations
• Targets for genome annotation
Pathways information for functional enrichment
• Identification of arthropod specific pathways
• KOBAS tool for pathways enrichment
2. Example: Citrus Greening (Huanglongbing)
• Most significant disease of citrus worldwide. 100% infection in Florida now
• More than $5 billion in lost citrus production and more than 10,000 lost jobs
• Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas)
• Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP)
Heck Lab September 2017, UC Riverside Extension
www.citrusgreening.org
Diaphorina citri
Asian citrus psyllid (ACP)
ACP bacterial
symbionts
CLas
Citrus spp.
The biological players
Wolbachia
Profftella
Carsonella
Kruse et. al. 2019 Insectswww.citrusgreening.org
500ng input DNA from single male psyllid
Duplicated contigs added to alternate assembly
Error correction
• DNA sequencing data
• RNA sequencing data
Duplication removal with Redundans
Scaffolding with Hi-C
3. Asian citrus psyllid genome (Diaphorina citri)
v1.1 v2.0
REFERENCE
v3.0
REFERENCE
Number of
contigs
161,988 1,906 13 + unplaced
Total bases 485 Mb 498 Mb 474 Mb
Longest 1 Mb 4.2 Mb 50.3 Mb
Contig N50 34.4 Kb 749 Kb 40.5Mb
Ns 19.3 Mb 4.5 Mb 13.4Mb
Complete
BUSCO (%)
65.9 75.9 88.3
Repeat (%) 26.37 31.9 30.2
www.citrusgreening.org
CLas induces mitochondrial dysfunction in
the gut
Kruse et al. 2017, Mann et al. 2018
MitoSOX staining
CLas +
CLas -
www.citrusgreening.org
CLas and Wolbachia localize in the
same ACP gut cells
DAPI nuclear stain CLas
(Pathogen)
Wolbachia
(Endosymbiont)
Merged
60X magnification
Kruse et. al. PLoS One 2017www.citrusgreening.org
First endosymbiont genomes from Psyllid in FL
Wolbachia Profftella Carsonella
10 scaffolds 1 chromosome
and 1 plasmid
1 chromosome
Largest 923 Kb 471 Kb -
Smallest 19 Kb 4.7 Kb -
Total Size 2 Mb 475.7 Kb 150 Kb
Stephanie Hoyt
Mueller lab
Wolbachia Profftella Carsonella
Number of reference genomes 8 2 9
Total number of conserved orthogroups 559 307 116
Number of conserved orthogroups in our assembly 557 307 106
Number of shared orthogroups (<50% genomes) 167 - 12
Orthology Analysis
www.citrusgreening.org
Wolbachia Strains
Scaffolds were removed from the Wolbachia
assembly resulting in a large decrease in
duplication, but a small decrease in conserved
orthogroup coverage
Based on these results we hypothesize
that there are two strains of Wolbachia
present in this sample:
• Strain 1: Scaffolds 1 and 2 cover
534/559 conserved orthogroups
• Strain 2: Scaffolds 1 and 3 cover
503/559 conserved orthogroups Comparing genomic sequences of our Wolbachia strain 2 and
reference genomes to our Wolbachia strain 1
www.citrusgreening.org
High quality annotation and databases are required to
identify targets for interdiction
15
Genome Annotation
Target for interdiction molecules
Pathway Databases
Expression Networks
…….
Host
Vector
Pathogen
www.citrusgreening.org
4. Gene Prediction Workflow
• RepeatModeler
• Protein masking
• RepeatMasker
Repeat
Masking
• RNA-seq HISAT &
StringTie
• Iso-Seq - GMAP &
Cupcake ToFU
Transcriptome
• Portcullis
junctions
• StringTie
• Iso-Seq
Mikado
• Mikado Gene
Loci
• Portcullis
junctions
Maker
• AHRD
• InterProScan
Functional
annotation
Augustus
GeneMark
www.citrusgreening.org
Prashant Hosmani
Mueller Lab
Student-driven community annotation
www.citrusgreening.org
High-quality Manually Curated Genes
Annotation set OGS1.0 OGS2.0 OGS3.0 Curated
No. of genes 19,311 20,793 19,049 811
No. of transcripts 20,966 25,292 21,345 916
No. of Exons Per transcript 5.42 7.06 7.29 7.87
Avg. transcript length (bp) 1,317 1,944 2,034 2,503
Avg. exon length (bp) 243 275 279 318
Non-canonical splice sites 6.05% 3.13% 2.47% 1.91%
OGS: Official Gene Set
www.citrusgreening.org
Pathway based manual curation
• Development
• Segmentation
• Wnt and other signaling pathways
• Hox genes
• Detoxification
• Immune response
• Metabolic and cellular functions
• Carbohydrate metabolism
• Chitin metabolism
• vATPase
• Chromatin remodeling
• Environmental/Sensory
• Circadian rhythm
• Phototransduction
• Reproduction
• ~1000 curated genes in OGSv3
• ~200 updated models from OGSv1 (Diaci v1.1)
www.citrusgreening.org
https://www.biorxiv.org/content/10.1101/869685v1
17 students among 30 authors
www.citrusgreening.org
5. Functional annotation: InterproScan results
10,946 (57%) genes have 2,281 unique GO terms
5,311 (27%) genes are assigned to 1,159 unique pathways
Runtime
• 1-3 days Cyverse Discovery Environment app
• 4 hours on 64 core single node with Docker container
InterProScan Motifs and Domains
16,081 (84%) proteins have at least one motif or domain assigned
8,752 unique InterPro domains
Average 3 domains per annotated protein
0 100 200 300 400 500 600 700 800 900 1000
GPCR family 3, GABA-B receptor
GPCR, family 2-like
WD40 repeat
MFS transporter
WD40/YVTN repeat-like
ARM-type_fold
Ig-like_fold
Kinase-like
Znf C2H2
P-loop NTPase
Motifs & Domains Identified by InterProScan
Poorly represented gene families
0 500 1000 1500 2000 2500 3000
nuclease activity
isomerase activity
lyase activity
structural molecule activity
phosphatase activity
enzyme regulator activity
ligase activity
GTPase activity
transferase activity, transferring acyl groups
methyltransferase activity
transferase activity, transferring glycosyl…
enzyme binding
cytoskeletal protein binding
ATPase activity
structural constituent of ribosome
DNA-binding transcription factor activity
RNA binding
peptidase activity
kinase activity
DNA binding
transmembrane transporter activity
oxidoreductase activity
ion binding
Summary of GO Biological Process
WARNING
Dcitr05g1219011:
Slim id: GO:0044403 symbiont process
GO:0019079 viral genome replication
InterProScan GO Results: Biological Process
Poorly represented gene families
0 200 400 600 800 1000 1200 1400 1600
extracellular region
endoplasmic reticulum
nucleoplasm
chromosome
cytoskeleton
plasma membrane
mitochondrion
ribosome
cytoplasm
nucleus
organelle
protein-containing complex
intracellular
cell
Summary of GO Cellular Component
WARNING Slim id: GO:0005618 cell wall 3
Dcitr00g0323011 GO:0009277 fungal-type cell wall
Dcitr00g0493011 GO:0009277 fungal-type cell wall
Dcitr00g1172011 GO:0009277 fungal-type cell wall
InterProScan GO Results: Cellular Component
How do we measure GO Quality?
BREADTH: all gene products should have GO
annotation (for CC, MF, BP).
DEPTH: function should be as detailed as possible. EVIDENCE: Published experiments provide direct
evidence of function in that species.
Buza et al 2008. Gene Ontology annotation quality analysis in model eukaryotes. Nucleic acids research, 36(2), e12-e12.
Adding Details to InterProScan GO: GOanna
0
20
40
60
80
100
120
140
160
180
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
InterPro GOanna Combined
Annotation Type
GO Annotation Quality
No. GO annotations proteins annotated Av Quality Score
Interpro & GOanna are complementary approaches.
InterProScan provides "breadth" (some GO annotation for most proteins)
GOanna provides "depth" (more detailed GO terms for some proteins)
What does GOanna add to GO annotation?
What does GOanna add to GO annotation?
Poorly represented functions
in InterProScan derived GO.
5. Functional annotation: Pathways
InterProScan results
• 5,311 (27%) genes are assigned to 1,159 unique pathways
• Average of 18.7 genes per pathway
• All are Reactome human pathways (R-HSA)
KOBAS Annotate results
• Assigns pathways via hits to Drosophila proteins
• 13,582 (71%) genes assigned pathways from following databases
• 24,101 Reactome
• 3,207 KEGG PATHWAY
• 1,003 PANTHER
• 7 BioCyc
Tissues
Gut
Abdomen
Antennae
Whole body
Terminal abdomen
Leg
Thorax
Head
Midgut
Sexes
Male
Female
Stages
Egg
Nymph
Adult
Infection states
CLas-
CLas+
CLas+ Low infection
CLas+ High Infection
Host
C. sinensis
C. medica
C. reticulata
C. macrophylla
6. Example: D. citri Infected / Uninfected
RNAseq samples from various tissues and citrus hosts for
the Asian citrus psyllid
www.citrusgreening.org
Comparison of Infected and Uninfected Samples
Infected samples: 22
Uninfected samples: 35
79% genes have > 1 read/million in at least
22 libs
Lot of variability across samples!!
InfectedUninfected
Differential Expression Results
16,879 genes with nonzero total read
count with adjusted p-value < 0.05
LFC > 0 (up) : 3162, 19%
LFC < 0 (down): 3627, 21%
Gene-wise estimates (black) and fitted
values (red)
Blue circles are genes with high dispersion
that are outliers
topGO Enriched GO Biological Processes
All GO terms with p-val 0.05
Deeper shades of red indicate smaller p-values
Larger circles represent higher proportion of proteins
Genes
GO BP
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 7,130 1,384 61
InterProScan
+ GOanna
11,490 7,673 2,022 58
topGO Enriched GO Molecular Functions
All GO terms with p-val 0.05
Deeper shares of red indicate smaller p-values
Larger circles represent higher proportion of proteins
Genes
GO MF
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 3,280 270 6
InterProScan
+ GOanna
11,490 9,365 713 16
DEGs associated with the cytoskeleton were
upregulated in the CLas-infected midguts
topGO Enriched GO Cellular Component
Genes
GO CC
mappable
genes GO terms
GO terms p <
0.01
InterProScan 10,946 536 111 0
InterProScan
+ GOanna
11,490 4,498 447 4All GO terms with p-val 0.05
Deeper shares of red indicate smaller p-values
Larger circles represent higher proportion of proteins
Enriched Pathways (KOBAS Identify)
“Localized mitochondrial dysfunction in the gut when
insects are exposed to CLas-infected trees”
Nuclear swelling and
fragmentation of the
heterochromatin
Green: universal set
Red: Annotated genes
“D. citri might inhibit the expression of endocytosis-
related genes in the midgut to prevent the further
transmission of Clas”
Pathway
Input
number
Background
number
P-Value
Gene Expression 281 1303 3.78E-07
Endocytosis 53 221 0.004147464
Cell Cycle 84 378 0.002532175
Nonsense-Mediated Decay (NMD)
52 195
0.000666531
siRNA biogenesis 9 18 0.007008944
One carbon pool by folate 11 25 0.00629025
Pathway
Input
number
Background
number
P-Value
Fatty Acyl-CoA Biosynthesis 31 92 0.00357436
ABC-family proteins mediated
transport
24 70 0.008101975
COPI-mediated anterograde
transport
33 106 0.007223449
Cellular response to hypoxia 12 25 0.008017566
Formation of ATP by chemiosmotic
coupling
12 23 0.004869061
Regulation of cytoskeletal
remodeling and cell spreading by IPP
complex components
6 6 0.005526965
Enriched Pathways: Up & Down Regulated Genes
Pathways enriched from Up-regulated genes Pathways enriched from Down-regulated genes
6. Summary of Functional Modeling
Tools for YOU!!
• Functional modeling tools to link
genomics back to biological
context
• Can now provide GO and pathway
information for functional
genomics
• InterPro motif analysis may help
guide manual annotations &
supports comparative analyses
• Tools available via AgBase &
Docker
Analysis of data sets
• Citrus greening vector (D. citri) now
has GO & pathways information
available
• GO and pathways analyses are
complementary (shared insights)
• During infection, vector
transcription and translation
responses are tissue-specific
• Lipid synthesis is down regulated
and protein transport is disrupted
• Strong links to mitochondrial
dysfunction
Accessing functional annotation resources
agbase-docs.readthedocs.io
de.cyverse.org
hub.docker.com/u/agbase
7. Future Plans & Acknowledgements
• Continued testing and deployment of the workflows
• When to use InterProScan and when to add GOanna GO?
• Prioritizing genes for manual curation
• Identification of missing or erroneous gene families
• Optimizing pathways information
• What format will make this most useful?
• How can we improve pathway reconstruction?
• Training sessions
• Feedback on tools and documentation
• Making functional data from this project available
• i5k, NAL, AgBase and Citrusgreening.org
• Docker and Singularity based pipeline
This work was supported by funding from
the USDA Agricultural Research Service
Thank
you!!

More Related Content

What's hot (20)

Open access journals
Open access journalsOpen access journals
Open access journals
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Nucleic acid probes
Nucleic acid probesNucleic acid probes
Nucleic acid probes
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Homology modeling: Modeller
Homology modeling: ModellerHomology modeling: Modeller
Homology modeling: Modeller
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
Dna probes
Dna probesDna probes
Dna probes
 
co immunoprecipitation
co immunoprecipitationco immunoprecipitation
co immunoprecipitation
 
Protein data bank
Protein data bankProtein data bank
Protein data bank
 
Homology
HomologyHomology
Homology
 
Reference Management tool: Zotero
Reference Management tool: ZoteroReference Management tool: Zotero
Reference Management tool: Zotero
 
Software tools for checking plagiarism
Software tools for checking plagiarismSoftware tools for checking plagiarism
Software tools for checking plagiarism
 
Tools for Transcriptome Data Analysis
Tools for Transcriptome Data AnalysisTools for Transcriptome Data Analysis
Tools for Transcriptome Data Analysis
 
Tertiary structure prediction- MODELLER, RASMOL
Tertiary structure prediction- MODELLER, RASMOLTertiary structure prediction- MODELLER, RASMOL
Tertiary structure prediction- MODELLER, RASMOL
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
RNA TECHNOLOGY
RNA TECHNOLOGYRNA TECHNOLOGY
RNA TECHNOLOGY
 
DNA-DNA Hybridisation
DNA-DNA HybridisationDNA-DNA Hybridisation
DNA-DNA Hybridisation
 
Phylogenetics1
Phylogenetics1Phylogenetics1
Phylogenetics1
 
Homology modeling
Homology modelingHomology modeling
Homology modeling
 
Unusual struc of y chromosome
Unusual struc of y chromosomeUnusual struc of y chromosome
Unusual struc of y chromosome
 

Similar to Functional annotation of invertebrate genomes

Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Surya Saha
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingSurya Saha
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingJonathan Eisen
 
2016. Motoaki seki. RIKEN cassava initiative
2016. Motoaki seki. RIKEN cassava initiative2016. Motoaki seki. RIKEN cassava initiative
2016. Motoaki seki. RIKEN cassava initiativeFOODCROPS
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionMonica Munoz-Torres
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalDr Anjani Kumar
 
Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920Sucheta Tripathy
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Surya Saha
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesMonica Munoz-Torres
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...Surya Saha
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceAndrew Su
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...Tintumann
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all OmicsSurya Saha
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Monica Munoz-Torres
 

Similar to Functional annotation of invertebrate genomes (20)

Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
Introduction to 16S Microbiome Analysis
Introduction to 16S Microbiome AnalysisIntroduction to 16S Microbiome Analysis
Introduction to 16S Microbiome Analysis
 
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meetingUpdates on the ACP v3 genome and annotation from USDA NIFA project meeting
Updates on the ACP v3 genome and annotation from USDA NIFA project meeting
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
2016. Motoaki seki. RIKEN cassava initiative
2016. Motoaki seki. RIKEN cassava initiative2016. Motoaki seki. RIKEN cassava initiative
2016. Motoaki seki. RIKEN cassava initiative
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Modern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx finalModern techniques of crop improvement.pptx final
Modern techniques of crop improvement.pptx final
 
Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920Tyler future of genomics thurs 0920
Tyler future of genomics thurs 0920
 
Prashant esa2017
Prashant esa2017Prashant esa2017
Prashant esa2017
 
Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...Visualization of insect vector-plant pathogen interactions in the citrus gree...
Visualization of insect vector-plant pathogen interactions in the citrus gree...
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Omics in crop improvement
Omics in crop improvementOmics in crop improvement
Omics in crop improvement
 
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis... CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
CitrusCyc: Metabolic Pathway Databases for the C. clementina and C. sinensis...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...High throughput approaches to understanding gene function and mapping archite...
High throughput approaches to understanding gene function and mapping archite...
 
31961.ppt
31961.ppt31961.ppt
31961.ppt
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Community resources for all y’all Omics
Community resources for all y’all OmicsCommunity resources for all y’all Omics
Community resources for all y’all Omics
 
Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.Web Apollo: Lessons learned from community-based biocuration efforts.
Web Apollo: Lessons learned from community-based biocuration efforts.
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 

More from Surya Saha

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...Surya Saha
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingSurya Saha
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesSurya Saha
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data Surya Saha
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017Surya Saha
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Surya Saha
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016Surya Saha
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Surya Saha
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Surya Saha
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing DataSurya Saha
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Surya Saha
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Surya Saha
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data SolutionsSurya Saha
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSurya Saha
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014Surya Saha
 
Sequencing: The Next Generation
Sequencing: The Next GenerationSequencing: The Next Generation
Sequencing: The Next GenerationSurya Saha
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation SequencingSurya Saha
 

More from Surya Saha (20)

An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...An open access resource portal for arthropod vectors and agricultural pathosy...
An open access resource portal for arthropod vectors and agricultural pathosy...
 
Updates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meetingUpdates on Citrusgreening.org database from USDA NIFA project meeting
Updates on Citrusgreening.org database from USDA NIFA project meeting
 
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant DiseasesAgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
AgriVectors: A Data and Systems Resource for Arthropod Vectors of Plant Diseases
 
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...Deciphering the genome of Diaphorina citri to develop solutions for the citru...
Deciphering the genome of Diaphorina citri to develop solutions for the citru...
 
Quality Control of Sequencing Data
Quality Control of Sequencing Data Quality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing 2017
Sequencing 2017Sequencing 2017
Sequencing 2017
 
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
Using Long Reads, Optical Maps and Long-Range Scaffolding to improve the Diap...
 
Sequencing 2016
Sequencing 2016Sequencing 2016
Sequencing 2016
 
Tomato Genome Build SL3.0
Tomato Genome Build SL3.0Tomato Genome Build SL3.0
Tomato Genome Build SL3.0
 
Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015Sequencing and Bioinformatics PGRP Summer 2015
Sequencing and Bioinformatics PGRP Summer 2015
 
Quality Control of Sequencing Data
Quality Control of Sequencing DataQuality Control of Sequencing Data
Quality Control of Sequencing Data
 
Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015Sequencing: The Next Generation 2015
Sequencing: The Next Generation 2015
 
Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…Tomato Genome SL2.50 and Beyond…
Tomato Genome SL2.50 and Beyond…
 
Sequencing
SequencingSequencing
Sequencing
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Quality Control of NGS Data Solutions
Quality Control of NGS Data  SolutionsQuality Control of NGS Data  Solutions
Quality Control of NGS Data Solutions
 
Sequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN PlatformSequencing, Genome Assembly and the SGN Platform
Sequencing, Genome Assembly and the SGN Platform
 
ICAR Soybean Indore 2014
ICAR Soybean Indore 2014ICAR Soybean Indore 2014
ICAR Soybean Indore 2014
 
Sequencing: The Next Generation
Sequencing: The Next GenerationSequencing: The Next Generation
Sequencing: The Next Generation
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 

Recently uploaded

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Managementsubedisuryaofficial
 
Shuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptxShuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptxMdAbuRayhan16
 
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...Alba Morales
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...NathanBaughman3
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthSérgio Sacani
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSELF-EXPLANATORY
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyNoelManyise1
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesrodneykiptoo8
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptxCherry
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxSultanMuhammadGhauri
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPirithiRaju
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)abhishekdhamu51
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsYOGESH DOGRA
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...PABOLU TEJASREE
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxmuralinath2
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Sérgio Sacani
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxAlaminAfendy1
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...Health Advances
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationanitaento25
 

Recently uploaded (20)

Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Shuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptxShuaib Y-basedComprehensive mahmudj.pptx
Shuaib Y-basedComprehensive mahmudj.pptx
 
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...
Musical Meetups Knowledge Graph (MMKG): a collection of evidence for historic...
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
A Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on EarthA Giant Impact Origin for the First Subduction on Earth
A Giant Impact Origin for the First Subduction on Earth
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
SAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniquesSAMPLING.pptx for analystical chemistry sample techniques
SAMPLING.pptx for analystical chemistry sample techniques
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptxGLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
GLOBAL AND LOCAL SCENARIO OF FOOD AND NUTRITION.pptx
 
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdfPests of sugarcane_Binomics_IPM_Dr.UPR.pdf
Pests of sugarcane_Binomics_IPM_Dr.UPR.pdf
 
Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)Microbial Type Culture Collection (MTCC)
Microbial Type Culture Collection (MTCC)
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
National Biodiversity protection initiatives and Convention on Biological Di...
National Biodiversity protection initiatives and  Convention on Biological Di...National Biodiversity protection initiatives and  Convention on Biological Di...
National Biodiversity protection initiatives and Convention on Biological Di...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 

Functional annotation of invertebrate genomes

  • 1. Functional annotation of invertebrate genomes Surya Saha, Fiona McCarthy, Amanda Cooksey, Anna K. Childers & Monica Poelchau suryasaha@arizona.edu | @SahaSurya August 31st, 2020
  • 2. Acknowledgements Fiona M McCarthy Monica Poelchau Chris Childers Mueller Lab, Boyce Thompson Institute
  • 3. Roadmap 1. Functional annotation tools for invertebrates 2. Example: Citrus greening 3. Asian citrus psyllid (Diaphorina citri) • Genome assembly • Microbiome and interaction with pathogen 4. Structural annotation of genes 5. Functional annotation • Gene Ontology (GO) • Pathways 6. Example: Functional modeling of Infected vs Uninfected D. citri 7. Upcoming resources and annotation plans
  • 4. How do we move from sequence to biology? • ARS-UA joint project to develop common workflows and practices for functionally annotating invertebrate genomes. • Training events to support use of these workflows. Annotation
  • 5. 1 2 3 4 5 Annotation 1. Functional annotation tools 1. Identify proteins 2. Transfer function based upon sequence homology 3. Assign function based upon functional motifs/domains 4. Combine GO, QC, formatting for use 5. Pathway information
  • 6. 1. Identify proteins 2. Transfer function based upon sequence homology 3. Assign function based upon functional motifs/domains 4. Combine GO, QC, formatting for use 5. Pathway information 1 2 3 4 5 Annotation Current Size of EXP Only Dbs SwissProt 72,337 TrEMBL 50,258 Invertebrate 20,741 Arthropod 12,081 Insecta 11,886 Nematode 4,941
  • 7. So What Does this Process Get Us? Motif/domain information for comparative & evolutionary studies • Evolution of gene families • Targets for genome annotation GO information for GO enrichment • Support for functional genomics • GO enrichment tools that allow you import your own GO annotations • Targets for genome annotation Pathways information for functional enrichment • Identification of arthropod specific pathways • KOBAS tool for pathways enrichment
  • 8. 2. Example: Citrus Greening (Huanglongbing) • Most significant disease of citrus worldwide. 100% infection in Florida now • More than $5 billion in lost citrus production and more than 10,000 lost jobs • Associated with gram negative bacterium Candidatus Liberibacter asiaticus (CLas) • Spread by insect vector, Diaphorina citri (Asian citrus psyllid, ACP) Heck Lab September 2017, UC Riverside Extension www.citrusgreening.org
  • 9. Diaphorina citri Asian citrus psyllid (ACP) ACP bacterial symbionts CLas Citrus spp. The biological players Wolbachia Profftella Carsonella Kruse et. al. 2019 Insectswww.citrusgreening.org
  • 10. 500ng input DNA from single male psyllid Duplicated contigs added to alternate assembly Error correction • DNA sequencing data • RNA sequencing data Duplication removal with Redundans Scaffolding with Hi-C 3. Asian citrus psyllid genome (Diaphorina citri) v1.1 v2.0 REFERENCE v3.0 REFERENCE Number of contigs 161,988 1,906 13 + unplaced Total bases 485 Mb 498 Mb 474 Mb Longest 1 Mb 4.2 Mb 50.3 Mb Contig N50 34.4 Kb 749 Kb 40.5Mb Ns 19.3 Mb 4.5 Mb 13.4Mb Complete BUSCO (%) 65.9 75.9 88.3 Repeat (%) 26.37 31.9 30.2 www.citrusgreening.org
  • 11. CLas induces mitochondrial dysfunction in the gut Kruse et al. 2017, Mann et al. 2018 MitoSOX staining CLas + CLas - www.citrusgreening.org
  • 12. CLas and Wolbachia localize in the same ACP gut cells DAPI nuclear stain CLas (Pathogen) Wolbachia (Endosymbiont) Merged 60X magnification Kruse et. al. PLoS One 2017www.citrusgreening.org
  • 13. First endosymbiont genomes from Psyllid in FL Wolbachia Profftella Carsonella 10 scaffolds 1 chromosome and 1 plasmid 1 chromosome Largest 923 Kb 471 Kb - Smallest 19 Kb 4.7 Kb - Total Size 2 Mb 475.7 Kb 150 Kb Stephanie Hoyt Mueller lab Wolbachia Profftella Carsonella Number of reference genomes 8 2 9 Total number of conserved orthogroups 559 307 116 Number of conserved orthogroups in our assembly 557 307 106 Number of shared orthogroups (<50% genomes) 167 - 12 Orthology Analysis www.citrusgreening.org
  • 14. Wolbachia Strains Scaffolds were removed from the Wolbachia assembly resulting in a large decrease in duplication, but a small decrease in conserved orthogroup coverage Based on these results we hypothesize that there are two strains of Wolbachia present in this sample: • Strain 1: Scaffolds 1 and 2 cover 534/559 conserved orthogroups • Strain 2: Scaffolds 1 and 3 cover 503/559 conserved orthogroups Comparing genomic sequences of our Wolbachia strain 2 and reference genomes to our Wolbachia strain 1 www.citrusgreening.org
  • 15. High quality annotation and databases are required to identify targets for interdiction 15 Genome Annotation Target for interdiction molecules Pathway Databases Expression Networks ……. Host Vector Pathogen www.citrusgreening.org
  • 16. 4. Gene Prediction Workflow • RepeatModeler • Protein masking • RepeatMasker Repeat Masking • RNA-seq HISAT & StringTie • Iso-Seq - GMAP & Cupcake ToFU Transcriptome • Portcullis junctions • StringTie • Iso-Seq Mikado • Mikado Gene Loci • Portcullis junctions Maker • AHRD • InterProScan Functional annotation Augustus GeneMark www.citrusgreening.org Prashant Hosmani Mueller Lab
  • 18. High-quality Manually Curated Genes Annotation set OGS1.0 OGS2.0 OGS3.0 Curated No. of genes 19,311 20,793 19,049 811 No. of transcripts 20,966 25,292 21,345 916 No. of Exons Per transcript 5.42 7.06 7.29 7.87 Avg. transcript length (bp) 1,317 1,944 2,034 2,503 Avg. exon length (bp) 243 275 279 318 Non-canonical splice sites 6.05% 3.13% 2.47% 1.91% OGS: Official Gene Set www.citrusgreening.org
  • 19. Pathway based manual curation • Development • Segmentation • Wnt and other signaling pathways • Hox genes • Detoxification • Immune response • Metabolic and cellular functions • Carbohydrate metabolism • Chitin metabolism • vATPase • Chromatin remodeling • Environmental/Sensory • Circadian rhythm • Phototransduction • Reproduction • ~1000 curated genes in OGSv3 • ~200 updated models from OGSv1 (Diaci v1.1) www.citrusgreening.org
  • 21. 5. Functional annotation: InterproScan results 10,946 (57%) genes have 2,281 unique GO terms 5,311 (27%) genes are assigned to 1,159 unique pathways Runtime • 1-3 days Cyverse Discovery Environment app • 4 hours on 64 core single node with Docker container
  • 22. InterProScan Motifs and Domains 16,081 (84%) proteins have at least one motif or domain assigned 8,752 unique InterPro domains Average 3 domains per annotated protein 0 100 200 300 400 500 600 700 800 900 1000 GPCR family 3, GABA-B receptor GPCR, family 2-like WD40 repeat MFS transporter WD40/YVTN repeat-like ARM-type_fold Ig-like_fold Kinase-like Znf C2H2 P-loop NTPase Motifs & Domains Identified by InterProScan Poorly represented gene families
  • 23. 0 500 1000 1500 2000 2500 3000 nuclease activity isomerase activity lyase activity structural molecule activity phosphatase activity enzyme regulator activity ligase activity GTPase activity transferase activity, transferring acyl groups methyltransferase activity transferase activity, transferring glycosyl… enzyme binding cytoskeletal protein binding ATPase activity structural constituent of ribosome DNA-binding transcription factor activity RNA binding peptidase activity kinase activity DNA binding transmembrane transporter activity oxidoreductase activity ion binding Summary of GO Biological Process WARNING Dcitr05g1219011: Slim id: GO:0044403 symbiont process GO:0019079 viral genome replication InterProScan GO Results: Biological Process Poorly represented gene families
  • 24. 0 200 400 600 800 1000 1200 1400 1600 extracellular region endoplasmic reticulum nucleoplasm chromosome cytoskeleton plasma membrane mitochondrion ribosome cytoplasm nucleus organelle protein-containing complex intracellular cell Summary of GO Cellular Component WARNING Slim id: GO:0005618 cell wall 3 Dcitr00g0323011 GO:0009277 fungal-type cell wall Dcitr00g0493011 GO:0009277 fungal-type cell wall Dcitr00g1172011 GO:0009277 fungal-type cell wall InterProScan GO Results: Cellular Component
  • 25. How do we measure GO Quality? BREADTH: all gene products should have GO annotation (for CC, MF, BP). DEPTH: function should be as detailed as possible. EVIDENCE: Published experiments provide direct evidence of function in that species. Buza et al 2008. Gene Ontology annotation quality analysis in model eukaryotes. Nucleic acids research, 36(2), e12-e12.
  • 26. Adding Details to InterProScan GO: GOanna 0 20 40 60 80 100 120 140 160 180 0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 InterPro GOanna Combined Annotation Type GO Annotation Quality No. GO annotations proteins annotated Av Quality Score Interpro & GOanna are complementary approaches. InterProScan provides "breadth" (some GO annotation for most proteins) GOanna provides "depth" (more detailed GO terms for some proteins)
  • 27. What does GOanna add to GO annotation?
  • 28. What does GOanna add to GO annotation? Poorly represented functions in InterProScan derived GO.
  • 29. 5. Functional annotation: Pathways InterProScan results • 5,311 (27%) genes are assigned to 1,159 unique pathways • Average of 18.7 genes per pathway • All are Reactome human pathways (R-HSA) KOBAS Annotate results • Assigns pathways via hits to Drosophila proteins • 13,582 (71%) genes assigned pathways from following databases • 24,101 Reactome • 3,207 KEGG PATHWAY • 1,003 PANTHER • 7 BioCyc
  • 30. Tissues Gut Abdomen Antennae Whole body Terminal abdomen Leg Thorax Head Midgut Sexes Male Female Stages Egg Nymph Adult Infection states CLas- CLas+ CLas+ Low infection CLas+ High Infection Host C. sinensis C. medica C. reticulata C. macrophylla 6. Example: D. citri Infected / Uninfected RNAseq samples from various tissues and citrus hosts for the Asian citrus psyllid www.citrusgreening.org
  • 31. Comparison of Infected and Uninfected Samples Infected samples: 22 Uninfected samples: 35 79% genes have > 1 read/million in at least 22 libs Lot of variability across samples!! InfectedUninfected
  • 32. Differential Expression Results 16,879 genes with nonzero total read count with adjusted p-value < 0.05 LFC > 0 (up) : 3162, 19% LFC < 0 (down): 3627, 21% Gene-wise estimates (black) and fitted values (red) Blue circles are genes with high dispersion that are outliers
  • 33. topGO Enriched GO Biological Processes All GO terms with p-val 0.05 Deeper shades of red indicate smaller p-values Larger circles represent higher proportion of proteins Genes GO BP mappable genes GO terms GO terms p < 0.01 InterProScan 10,946 7,130 1,384 61 InterProScan + GOanna 11,490 7,673 2,022 58
  • 34. topGO Enriched GO Molecular Functions All GO terms with p-val 0.05 Deeper shares of red indicate smaller p-values Larger circles represent higher proportion of proteins Genes GO MF mappable genes GO terms GO terms p < 0.01 InterProScan 10,946 3,280 270 6 InterProScan + GOanna 11,490 9,365 713 16
  • 35. DEGs associated with the cytoskeleton were upregulated in the CLas-infected midguts
  • 36. topGO Enriched GO Cellular Component Genes GO CC mappable genes GO terms GO terms p < 0.01 InterProScan 10,946 536 111 0 InterProScan + GOanna 11,490 4,498 447 4All GO terms with p-val 0.05 Deeper shares of red indicate smaller p-values Larger circles represent higher proportion of proteins
  • 38. “Localized mitochondrial dysfunction in the gut when insects are exposed to CLas-infected trees” Nuclear swelling and fragmentation of the heterochromatin
  • 39. Green: universal set Red: Annotated genes
  • 40. “D. citri might inhibit the expression of endocytosis- related genes in the midgut to prevent the further transmission of Clas”
  • 41. Pathway Input number Background number P-Value Gene Expression 281 1303 3.78E-07 Endocytosis 53 221 0.004147464 Cell Cycle 84 378 0.002532175 Nonsense-Mediated Decay (NMD) 52 195 0.000666531 siRNA biogenesis 9 18 0.007008944 One carbon pool by folate 11 25 0.00629025 Pathway Input number Background number P-Value Fatty Acyl-CoA Biosynthesis 31 92 0.00357436 ABC-family proteins mediated transport 24 70 0.008101975 COPI-mediated anterograde transport 33 106 0.007223449 Cellular response to hypoxia 12 25 0.008017566 Formation of ATP by chemiosmotic coupling 12 23 0.004869061 Regulation of cytoskeletal remodeling and cell spreading by IPP complex components 6 6 0.005526965 Enriched Pathways: Up & Down Regulated Genes Pathways enriched from Up-regulated genes Pathways enriched from Down-regulated genes
  • 42. 6. Summary of Functional Modeling Tools for YOU!! • Functional modeling tools to link genomics back to biological context • Can now provide GO and pathway information for functional genomics • InterPro motif analysis may help guide manual annotations & supports comparative analyses • Tools available via AgBase & Docker Analysis of data sets • Citrus greening vector (D. citri) now has GO & pathways information available • GO and pathways analyses are complementary (shared insights) • During infection, vector transcription and translation responses are tissue-specific • Lipid synthesis is down regulated and protein transport is disrupted • Strong links to mitochondrial dysfunction
  • 43. Accessing functional annotation resources agbase-docs.readthedocs.io de.cyverse.org hub.docker.com/u/agbase
  • 44. 7. Future Plans & Acknowledgements • Continued testing and deployment of the workflows • When to use InterProScan and when to add GOanna GO? • Prioritizing genes for manual curation • Identification of missing or erroneous gene families • Optimizing pathways information • What format will make this most useful? • How can we improve pathway reconstruction? • Training sessions • Feedback on tools and documentation • Making functional data from this project available • i5k, NAL, AgBase and Citrusgreening.org • Docker and Singularity based pipeline This work was supported by funding from the USDA Agricultural Research Service
  • 45.
  • 46.