[Pink Sherbet Photography]RNAseq analysis: Differential gene expression (2/2)Hopscotch and isoformsAugust 25, 2011
Reads->alignment to reference genome->transcript assemblyResulting file type: BAM, gff/bed“What transcripts are in my samples?”August 25, 2011Transcript assemblyProjectsFastqMappingQuick recap: Mapping and transcript assemblyGarber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
RNAseq analysis questionIs there a difference in the transcriptome of two different conditions ?Quantify expressionQuantify differenceAugust 25, 2011Condition1                             Condition2
RNAseqvsExpression ArrayRNAseq can capture a larger dynamic rangeRNAseq can handle degraded samplesGain additional informationNew transcripts(New) isoformsVariantsAugust 25, 2011FlatteningoutArray RNA-seqWang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 PMID: 19015660
ChallengesStrand-specific methods still biasedNumber of reads not necessarily correlate with transcript abundanceLonger transcripts have more reads (fragmentation). Technical variability between runs causes different number of total reads.Lowly abundant does not mean non-functionalHow to quantify expression of isoformsAugust 25, 2011Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011 PMID: 21191423Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Production Informatics and BioinformaticsAugust 25, 2011Produce raw sequence readsBasic ProductionInformaticsMap to genome and generate raw genomic features (e.g. SNPs)Advanced Production Inform.Analyze the data; Uncover the biological meaningBioinformaticsResearchPer one-flowcell project
Quantifying expression in RNAseqLong genes get more readsNormalize: fragments per kilobase of transcript per million mapped reads (FPKM) FPKM accounts for the dependency between paired-end readsAugust 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009 PMID: 19371405
Quantifying expression of overlapping isoformsWe do not know where reads of overlapping isoformsacutally belongAlexa-Seqcounting only the reads that map uniquely to a single isoformisoform-expression methods (cufflinks) likelihood function modeling the sequencing process (not very accurate for lowly expressed transcripts)'exon intersection method’ (analogous to expression microarrays)counts reads mapped to its constitutive exons (reduce power for differential expression analysis)'exon union method’counts all reads mapped to any exon in any of the gene's isoforms (underestimates expression for alternatively spliced genes). August 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Differentially expressionWhat is a statistically significant difference between a set of measurements (expression of a gene) of two populations (conditions)First, estimate variabilityObserve biological variability (needs large numbers of replicates to sample the population). model biological variabilitymodel the count variance across replicates as a nonlinear function of the mean counts using various different parametric approaches (such as the normal and negative binomial distributions) (EdgeR, DESeq, Cuffdiff)August 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Three things to rememberRNAseq captures larger dynamic range (more sensitive)Additional information compared to arrays (e.g. isoforms)Need to make assumptions/compromises (quantification, few replicates) August 25, 2011[cabbit]
Next Weeks: NGS Discussion group Jake’s topic August 25, 2011Two Weeks:Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.

Differential gene expression

  • 1.
    [Pink Sherbet Photography]RNAseqanalysis: Differential gene expression (2/2)Hopscotch and isoformsAugust 25, 2011
  • 2.
    Reads->alignment to referencegenome->transcript assemblyResulting file type: BAM, gff/bed“What transcripts are in my samples?”August 25, 2011Transcript assemblyProjectsFastqMappingQuick recap: Mapping and transcript assemblyGarber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 3.
    RNAseq analysis questionIsthere a difference in the transcriptome of two different conditions ?Quantify expressionQuantify differenceAugust 25, 2011Condition1 Condition2
  • 4.
    RNAseqvsExpression ArrayRNAseq cancapture a larger dynamic rangeRNAseq can handle degraded samplesGain additional informationNew transcripts(New) isoformsVariantsAugust 25, 2011FlatteningoutArray RNA-seqWang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 PMID: 19015660
  • 5.
    ChallengesStrand-specific methods stillbiasedNumber of reads not necessarily correlate with transcript abundanceLonger transcripts have more reads (fragmentation). Technical variability between runs causes different number of total reads.Lowly abundant does not mean non-functionalHow to quantify expression of isoformsAugust 25, 2011Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011 PMID: 21191423Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 6.
    Production Informatics andBioinformaticsAugust 25, 2011Produce raw sequence readsBasic ProductionInformaticsMap to genome and generate raw genomic features (e.g. SNPs)Advanced Production Inform.Analyze the data; Uncover the biological meaningBioinformaticsResearchPer one-flowcell project
  • 7.
    Quantifying expression inRNAseqLong genes get more readsNormalize: fragments per kilobase of transcript per million mapped reads (FPKM) FPKM accounts for the dependency between paired-end readsAugust 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009 PMID: 19371405
  • 8.
    Quantifying expression ofoverlapping isoformsWe do not know where reads of overlapping isoformsacutally belongAlexa-Seqcounting only the reads that map uniquely to a single isoformisoform-expression methods (cufflinks) likelihood function modeling the sequencing process (not very accurate for lowly expressed transcripts)'exon intersection method’ (analogous to expression microarrays)counts reads mapped to its constitutive exons (reduce power for differential expression analysis)'exon union method’counts all reads mapped to any exon in any of the gene's isoforms (underestimates expression for alternatively spliced genes). August 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 9.
    Differentially expressionWhat isa statistically significant difference between a set of measurements (expression of a gene) of two populations (conditions)First, estimate variabilityObserve biological variability (needs large numbers of replicates to sample the population). model biological variabilitymodel the count variance across replicates as a nonlinear function of the mean counts using various different parametric approaches (such as the normal and negative binomial distributions) (EdgeR, DESeq, Cuffdiff)August 25, 2011Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 10.
    Three things torememberRNAseq captures larger dynamic range (more sensitive)Additional information compared to arrays (e.g. isoforms)Need to make assumptions/compromises (quantification, few replicates) August 25, 2011[cabbit]
  • 11.
    Next Weeks: NGSDiscussion group Jake’s topic August 25, 2011Two Weeks:Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.

Editor's Notes

  • #2 http://2.bp.blogspot.com/_BPr6hpMG0tg/TSZdkYDcRvI/AAAAAAAAAjY/ReScIkWNySg/s1600/drink.jpghttp://www.sciencemag.org/content/291/5507/1260.full?sid=23d07e07-ccc5-4b15-8e6d-934a02e9580chttp://biostar.stackexchange.com/questions/6638/rna-seq-analysis