Comparison between RNASeq and Microarray for Gene Expression Analysis


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The basic concept behind the use of GeneChip arrays for gene expression is simple: labeled cDNA or cRNA targets derived from the mRNA of an experimental sample are hybridized to nucleic acid probes attached to the solid support. By monitoring the amount of dye label associated with each DNA location, it is possible to infer the abundance of each mRNA species represented. For transcriptome profiling, the input is usually about 1ug total RNA that are poly-A selected to ensure only mature mRNA is being assayed.
  • Poly(A)+ mRNA is purified, fragmented, and then converted to a cDNA library with 5′ and 3′ adapter sequences. Short sequence reads are generated from the cDNA library. Normally, reads are mapped to previously annotate known transcripts and a pile un-mapped reads are kept. Reads that map to novel expressed sequences, including alternative exons and corresponding splice junction sequences
  • Two RNA sample types MAQC brain and universal human Reference RNA were processed using 5 technical replicates on both microarray and RNA-Seq. Once teh data is generated, the microarray data was processed using MAQC. For RNA-Seq, the sample cDNA libraries were prepared with Illumina protocol and sequenced to a depth of ~30 million mapped reads.
  • This is the scatter plot of technical replicates of the samples analyzed by RNA-Seq and microarray. The false positive rates are comparable between the two methods, and both methods have extremely high correlation between replicates (R>0.99). The plots demonstrate that RNA-Seq identifies more genes and spans a wider dynamic range compared to the microarray.
  • Scatterplot of fold change per gene as measured by RNASeq and microarray. Genes identified as differentially expressed by both platform are plotted in red, genes identified by RNASeq in blue, microarray in yellow and neither ins green. While the correlation between the two platforms in identifying differentially expressed genes is really high, this figure clearly indicates that a discrepancy between the platforms in the ability to identify genes as differentially expressed. The gene subset segmentation reveal that RNA-Seq counts identified significantly more differentially expressed genes. However, microarray does detect gene expression differences. Further valudation from a subset of 1000 genes for which PCR data is available, RNASeq data shows higher concordance with PCR results than microarray.
  • A study by Mooney et al, use a paired RNA sequencing (RNA-Seq)/microarray analysis of a set of 4 normal canine lymph nodes and 10 canine lymphoma fine needle aspirates to identify technical biases and variation between the technologies. We use a paired RNA sequencing (RNA-Seq)/microarray analysis of a set of four normal canine lymph nodes and ten canine lymphoma fine needle aspirates to identify technical biases and variation between the technologies and compare the 15,092 annotated genes on chip.
  • Both RNA-Seq and microarray observations provide present detection calls for 15,092 genes in each of the 14 samples. Thepercent present detection calls provided by the two technologies agreed with high frequency (73%) and were statistically associated(Table 3; p,10215, odds ratio .40). Among genes probed by both methods, percent present detection frequencies of 69% and 44%were obtained by RNA-Seq and microarray, respectively. Among genes called present using microarray over 97% were detectedusing RNA-Seq.Variation among expression profiles obtained using RNASeq is similar to that obtained using microarray after removing contributions of the first surrogate variable [42]. Each letter denotes a sample from a dog having a normal (N), B-cell (B), or T-cell (T) diagnosis as in the legend, with subscript ‘m’ run on the microarray platform and subscript ‘r’ run onthe NGS platform. a) Principal component scores b) Hierarchical clustering
  • Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from thesecond larval stage of C. elegansto Young adult (YA)for all genes Each point represents a gene from the composite model. RNA-Seq expression levels per gene were measured using RPKM, and tiling array levels were measured using the mean intensity of probes falling within composite exons. The Spearman's coefficient is 0.90, indicating that the platforms correlate well on identical samples. The disproportionate number of genes in the upper left likely represents cross-hybridization.
  • Differential expression of genes between the L2 and YA stages. (a) Correlation of log2(YA/L2) ratios between RNA-Seq and tiling arrays.. Black: not significantly differentially expressed between samples.Blue: significantly differentially expressed (q ≤ 0.01). The ratio of expression levels is well-correlated, but RNA-Seq has a larger dynamic range. (b) Venn diagram of genes called differentially expressed by each platform. There is significant overlap (8,976) between the two platforms, but more genes were called differentially expressed by RNA-Seq (14,201) than by tiling arrays (10,283), likely reflecting its greater dynamic range. A total of 4,326 genes were not called differentially expressed by either technology. 
  • ROC curve analysis. Black: tiling array. Red: RNA-Seq with all 32 million reads. It is evident that the RNA-Seq substantially outperforms the tiling array with consistently higher sensitivity at lower FPR. Remaining curves are for RNA-Seq with only a subset of reads utilized. At an FPR = 0.05, just 4 million reads (blue) are required to attain the same sensitivity as two tiling array replicates.
  • Comparison between RNASeq and Microarray for Gene Expression Analysis

    1. 1. Yaoyu E. Wang, Ph.D Center for Cancer Computational Biology, DFCI SPECSII webinar June 05, 2013
    2. 2. - Transcriptome profiling represents a static gene expression state of a biological sample across the genome - Allows for direct genomic comparisons with multiple samples to determine genes that exhibit differential expression in different state (i.e. normal vs. tumor) - Allows for hypothesis generation on molecular abnormalities and mechanisms that may contribute to the tumor phenotype - Provides information on molecular subtypes, the development of prognostic and predictive molecular signatures - Two main technologies: a. Microarray b. RNA-Sequencing (RNASeq) using next generation sequencing
    3. 3. Affymetrix GeneChip scanner
    4. 4. Blencowe B J et al. Genes Dev. 2009;23:1379-1386 Illumina HiSeq
    5. 5. .bcl files CASAVA processing •Demultiplexing •Fastq file generation •Sequencing filtering Raw files containing base calls and quality scores Illumina defined quality filters Split into Project and Sample Folders Jones_Lab ChIP_A ChIP-B Marcus_Lab RNA-SeqA RNA-SeqB RNA-SeqC Williams_Lab Exome1 Exome2 Fastq Files Fastq Files Fastq Files
    6. 6. Haas & Zody. Nature Biotechnology 28, 421–423 (2010) Using known annotations And compare to known annotations •Differential Expression •Differential Isoform Abundance •RNA editing •SNP, indel detection
    7. 7. Technology RNASeq Microarray High run-to run reproducibility Yes Yes Dynamic Range Comparable to actual transcript abundance >8000-fold Hundred fold Able to detect alternative splice site and novel isoforms Yes No De novo analysis of samples without reference genome Yes No Multiplexing Samples in one run Yes No Required amount of total RNA >100 ng ~1 ug Re-analyzable data Yes No
    8. 8. Technology RNASeq Microarray Heterogeneity of read coverage across an expressed region Yes No Well understood sources of experimental bias No Yes Data portable on a flush drive (~4G) No Yes Data is analyzable by any PC No Yes Cheaper cost per sample No(?) Yes(?)
    9. 9. RNA-Seq Experiment GEO Database
    10. 10. White paper, Illumina
    11. 11. White paper, Illumina
    12. 12. Comparing Expression Profiles from Microarrays to RNASeq n=7532 n=4537
    13. 13. Mooney M, PloSOne (2013) 10 Lymphoma (3T-cell, 7 B-cell) 4 Normal lymph node Total RNA PE100 run 50-100 million mapped reads Compare 15,092 annotated genes on chip
    14. 14. Mooney M, PloSOne (2013) T NB r=0.6; p<10-15
    15. 15. c. elegans Biological Replicates for L2 andYA stages AffyTilingArrays* Illumina RNASeq Agarwal, BMC Genomics (2010) * Covers whole c.elegans genome
    16. 16. Differential Expression genes between the L2 andYA stage Agarwal, BMC Genomics (2010)
    17. 17. RNA-Seq and tiling arrays Tiling Array Microarray Maximum Sensitivity RNASeq 11-plex RNASeq 6-plex Agarwal, BMC Genomics (2010)
    18. 18. Per Sample Microarray Illumina HiSeq 1 per Chip/Lane $670 $4,010.00 2 plex NA $2,097.50 4-plex NA $1,141.25 6-plex NA $822.50 8-plex NA $663.13 6-plex 11-plex
    19. 19. Per Sample Microarray Illumina HiSeq 1 per Chip/Lane $670 $4,010.00 2 plex NA $2,097.50 4-plex NA $1,141.25 6-plex NA $822.50 8-plex NA $663.13
    20. 20. Data Per Sample Time to download 1 Sample Time to download 100 samples Cost to Store on the Cloud per Month RNASeq 30-65GB 1 Hr 6 days $270 Microarray 30MB 5 second 8 minutes $0.30
    21. 21. -Application withUser Interface RNA-Seq analysis (i.e. Galaxy) can only handle very few samples -Knowledge of Linux server, scripting language, programming language is absolutely REQUIRED -Lack of detailed understanding in NGS technology and data leads to diverse bioinformatics tools with different characteristics LawWC ,Voom!, Bionconductor (2013)
    22. 22. The answer isYes - Transcriptome profiles generated by microarray and RNASeq are in strongly concordance - Microarray data generated in the last decades is durable - RNASeq is it offers more a lot more biological information than microarray that is re-analyzable - NGS is getting cheaper However, the devil is in the data - NGS data is a lot more expensive to store and analyze - Specialized computing infrastructure and personnel are required to take advantage of the information from NGS data