Differential gene expression in
Polistes dominula
Daniel S. Standage, Brendel Group Meeting, 21 Nov 2013


Context
 Basic differential expression analysis
 Isoform-level analysis unreliable
 Refocused on locus-level DE analysis
Interval loci (iLoci)
 Partition genome into segments that contain
 0 protein-coding genes
 1 protein-coding gene
 2 or more overlapping protein-coding genes

 P. dominula genome contains 18,675 iLoci
 8,531 with 0 genes
 9,197 with 1 gene

 947 with 2-5 genes
Out-of-the-box analysis
 RSEM: estimate expression levels for each sample

independently (uses Bowtie to align reads)
 Combine expression data into a single matrix
 EBSeq: normalize expression levels and identify

differentially expressed genes
Results and observations
 295 differentially expressed iLoci
 Grouping of samples is troubling
 Similar concerns as with previous analysis
 Some iLoci with very many reads mapped
 Some iLoci with very few reads mapped
 Concerns about normalizing over such a large

dynamic range
Results and observations
 294 differentially expressed iLoci
 Grouping of samples is troubling
 Similar concerns as with previous analysis
 Some iLoci with very many reads mapped
 Some iLoci with very few reads mapped
 Concerns about normalizing over such a large

dynamic range
iLocus filtering
 Filtered the iLoci based on
 Number of reads mapped
 Number of samples with reads mapped
 Distribution of mapped reads across samples

 10,043 / 18,675 iLoci (54%) passed filtering criteria
 Re-ran RSEM/EBSeq procedure from scratch
New results
 123 differentially expressed iLoci
 1 sample (queen 4) still inconsistently grouped
Analysis sans Q4
 Removed the Q4 sample and re-ran EBSeq step
 Verified normalization is working as we expected
 Found very clean result
Analysis sans Q4
 Identified 314 differentially expressed iLoci
 219 (70%) over-expressed in workers
 95 contain 0 genes
 197 contain 1 gene

 22 contain 2 or more genes
Biological interpretation
 Manual analysis of DE iLoci
 xGDBvm
 yrGATE

 Two protein families occurred very frequently
 Cytochrome P450s
 NADH dehydrogenases
5 questions
 How many CYP genes are in the wasp genome?
 What percentage of these CYP genes are DE?
 Do CYPs and NADH dehydrogenases belong to the

same pathways?
 Can the CYP genes in the genome be categorized?
 Can reads discarded during genome assembly

provide insight into mitochondrial contamination?
CYPs in Polistes dominula
 Identified with a basic BLASTP search
 Query: translations of Maker annotations
 Database: Hymenopteran CYPs from NCBI

 154 iLoci potentially contain CYP genes
 Not all matched queries represent CYPs
 Stricter criteria required for high-confidence count
Differentially expressed CYP genes
 Took intersection of 2 lists
 mRNAs from DE iLoci
 mRNAs potentially encoding CYPs

 Identified 12 putative DE CYPs
 11 verified manually
 9 / 11 over-expressed in queens
DE NADH dehydrogenase genes
 BLASTP search found 38 potential NADHdh genes
 12-15 DE NADHdh genes
 16 putative DE NADHdh genes
 1 thrown out by manual examination
 3 borderline

 14 / 15 are over-expressed in workers
Brendel Group Presentation: 21 Nov 2013

Brendel Group Presentation: 21 Nov 2013

  • 1.
    Differential gene expressionin Polistes dominula Daniel S. Standage, Brendel Group Meeting, 21 Nov 2013 
  • 2.
    Context  Basic differentialexpression analysis  Isoform-level analysis unreliable  Refocused on locus-level DE analysis
  • 3.
    Interval loci (iLoci) Partition genome into segments that contain  0 protein-coding genes  1 protein-coding gene  2 or more overlapping protein-coding genes  P. dominula genome contains 18,675 iLoci  8,531 with 0 genes  9,197 with 1 gene  947 with 2-5 genes
  • 4.
    Out-of-the-box analysis  RSEM:estimate expression levels for each sample independently (uses Bowtie to align reads)  Combine expression data into a single matrix  EBSeq: normalize expression levels and identify differentially expressed genes
  • 5.
    Results and observations 295 differentially expressed iLoci  Grouping of samples is troubling  Similar concerns as with previous analysis  Some iLoci with very many reads mapped  Some iLoci with very few reads mapped  Concerns about normalizing over such a large dynamic range
  • 7.
    Results and observations 294 differentially expressed iLoci  Grouping of samples is troubling  Similar concerns as with previous analysis  Some iLoci with very many reads mapped  Some iLoci with very few reads mapped  Concerns about normalizing over such a large dynamic range
  • 8.
    iLocus filtering  Filteredthe iLoci based on  Number of reads mapped  Number of samples with reads mapped  Distribution of mapped reads across samples  10,043 / 18,675 iLoci (54%) passed filtering criteria  Re-ran RSEM/EBSeq procedure from scratch
  • 9.
    New results  123differentially expressed iLoci  1 sample (queen 4) still inconsistently grouped
  • 11.
    Analysis sans Q4 Removed the Q4 sample and re-ran EBSeq step  Verified normalization is working as we expected  Found very clean result
  • 14.
    Analysis sans Q4 Identified 314 differentially expressed iLoci  219 (70%) over-expressed in workers  95 contain 0 genes  197 contain 1 gene  22 contain 2 or more genes
  • 16.
    Biological interpretation  Manualanalysis of DE iLoci  xGDBvm  yrGATE  Two protein families occurred very frequently  Cytochrome P450s  NADH dehydrogenases
  • 17.
    5 questions  Howmany CYP genes are in the wasp genome?  What percentage of these CYP genes are DE?  Do CYPs and NADH dehydrogenases belong to the same pathways?  Can the CYP genes in the genome be categorized?  Can reads discarded during genome assembly provide insight into mitochondrial contamination?
  • 18.
    CYPs in Polistesdominula  Identified with a basic BLASTP search  Query: translations of Maker annotations  Database: Hymenopteran CYPs from NCBI  154 iLoci potentially contain CYP genes  Not all matched queries represent CYPs  Stricter criteria required for high-confidence count
  • 19.
    Differentially expressed CYPgenes  Took intersection of 2 lists  mRNAs from DE iLoci  mRNAs potentially encoding CYPs  Identified 12 putative DE CYPs  11 verified manually  9 / 11 over-expressed in queens
  • 20.
    DE NADH dehydrogenasegenes  BLASTP search found 38 potential NADHdh genes  12-15 DE NADHdh genes  16 putative DE NADHdh genes  1 thrown out by manual examination  3 borderline  14 / 15 are over-expressed in workers