2014 wcgalp


Published on

WCGALP talk, August 18th, 2014.

Published in: Science
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The genetic resistance of MD is complex and controlled by many genes. The B locus is a major locus and incidence of MD varies widely among different haplotypes. In ADOL, lines 6 and 7, chickens share B2 haplotype but differ greatly in response to MD resistance. Several studies have been conducted to identify
  • Data from a much larger scale could be used to generate a hypothesis for studying a mechanism of MD resistance.
    Understanding the mechanism of disease and MD resistance could lead to development of better vaccines.
  • As you know…
  • Some unique splice junctions are also found in other datasets. Ensembl models have many unique splice junctions because the models include genes and isoforms from other tissues.
  • Incorporating Ensembl models help, but Ensembl models do not include all genes in our samples.
  • Note that it is fortunate that GOSeq supports custom KEGG annotation. Most tools do not accept custom annotation, so you can only use annotation of one species at a time.
  • It should be pointed out that phagosome pathway is only enriched in line 7. The phagosome pathway, as you know, is critically important for the activation of T cells and elicitation of the adaptive immune responses becausse genes in this pathway are involved in phagocytosis and antigen presentation.
  • As expected, biological processes involved in adaptive immune responses only enriched in line 7.
  • If we could tag or barcode all reads, it’d be easy to estimate isoform expression.
  • 2014 wcgalp

    1. 1. Exploring Marek’s Disease Resistance with RNAseq C. Titus Brown Michigan State University
    2. 2. Genetic resistance to Marek’s Disease • MHC (B) locus has a major influence on MD resistance • Several haplotypes of B locus have been found to correlate with resistance – B21 most resistance – B19 susceptibility • Lines 6 and 7 (ADOL*) are B2 homozygous, but line 6 is resistant and line 7 is susceptible to MD • Relatively few non-MHC genes have been identified *Avian disease and Oncology Laboratory, East Lansing
    3. 3. Research Goal • Identify non-MHC genes influencing MD resistance from a genome-wide gene and isoform expression analysis based on RNA-Seq data • Generate hypotheses for studying the mechanism controlling MD resistance Collaboration with Hans Cheng (ADOL) and Jerry Dodgson (MSU) Dr. Likit Preeyanon
    4. 4. Research Plan GCCGCGGTTCCGTGGTT ACCGCGGTGGTGGTT ACCGCGTTTGTGGTT ACCGCGGTGGTGGTT ACCGCGGTCCGTGGCC CCCGCGGTGGTGGTT Differential Gene Expression Pathway Analysis A B C D B CA D Differential Exon Usage Lines 6 and 7 Control and infected (4 dpi) Single-end and Paired-end Illumina Sequencing Dr. Likit Preeyanon
    5. 5. RNA-Seq Method AAAAAAAA AAAAAAAA AAAAAAAA AAA AAA AAA Fragmented and sequenced Short reads (<200bp) Adapted from Shirley et al Nat Methods 2009
    6. 6. Gene models and isoforms are woefully incomplete – e.g. ENSEMBL missing many exon-exon junctions. De novo reconstruction Ab initio reconstruction Dr. Likit Preeyanon
    7. 7. GIMME: Software for Merging Gene Models Assembly- based Local Assembly GIMME Reference- guided Merged Models In-house software Dr. Likit Preeyanon Dr. Likit Preeyanon
    8. 8. Merged Gene Models Global Assembly Local Assembly Reference-guided Merged (consensus) Model Newly predicted isoform
    9. 9. Merged models connect fragmented gene models & provide new isoforms Merged models can glue fragmented gene models and include unannotated isoforms. Gene B Gene A Gene A Reference-guided Merged model
    10. 10. IDH3A Gene – now with both UTRs! Merged RefSeq ENSEMBL UTR
    11. 11. IDH3A– different models, different predicted expression… SE : single-end, PE: paired-end Not signif.. Signif
    12. 12. Differentially Expressed Genes from Different Gene Model Sets …Differ. DE genes by EBseq FDR < 0.05 Ref-guided Ref-guided
    13. 13. In addition, many of the diff expr genes are not annotated in KEGG Ref-guided
    14. 14. GOseq FDR 0.05 Chicken + Human KEGG Pathway 40 pathways Must merge in human KEGG annotations
    15. 15. Enriched KEGG Pathways by GOSeq GOseq FDR < 0.05
    16. 16. Biological Processes (BP) categories involved in Adaptive Immune Responses are Enriched in Line 7 (susceptible) GO ID Description Adjusted p-value 0009615 Response to virus 0.00023 0050670 Regulation of lymphocyte proliferation 0.00048 0002252 Immune effector process 0.00068 0051249 Regulation of lymphocyte activation 0.0027 0042129 Regulation of T cell proliferation 0.0032 0002250 Adaptive immune response 0.0106 At early stage of infection, elicitation of the adaptive immune responses appears to be delayed in line 6.
    17. 17. Isoform Expression Estimation Gene Expression = 400x 20% 80% Gene Expression = 405x 2% 98% Sample A Sample B
    18. 18. How to Estimate Isoform Expression Spliced reads
    19. 19. Differential Exon Usage of ITGB2 Gene from MISO Spliced reads Percent Spliced In (Ψ) Read coverage
    20. 20. Genes with predicted differential splicing can be categorized into four groups Cutoff = 0.2 6 Ctrl 6 Inf 7 Ctrl 7 Inf 1 1 1 1 0 0 0 0 Group I 11 Genes ψ 1 1 1 1 0 0 0 0 Group II 19 Genes ψ 1 1 1 1 0 0 0 0 Group III 20 Genes ψ 0 1 0 1 0 1 0 1 Group IV 1 Genes ψ
    21. 21. The main point • We are completely at the mercy of annotations to interpret our large-scale data. • Need more experimental information! • But also, better methods => better signal
    22. 22. Concluding thoughts (I) • Computational analysis of high-throughput sequencing data can help refine hypotheses, but cannot conclusively resolve mechanism. • Don’t knock “refining hypotheses”, though! Complex biological phenomena like disease are refractory to simplifying assumptions.
    23. 23. Concluding thoughts (II) • Much of the -omic data being gathered by all of you has utility far beyond your specific research question. • This is particularly true in “semi-model” organisms where annotations are generally poor and not species-specific, and where there may be significant intra-species variation. • How can we better share this data, to make faster and better progress?
    24. 24. Where should we spend our –omics money? • Improving genomes is still expensive and requires significant technical expertise. • mRNAseq is inexpensive, broadly useful and wonderful for building better gene models. • Proteomics and metabolomics? • Better tools, annotation, and data sharing and exploration portals are critically important to the future of (agricultural genomics. Thanks!