An introduction to RNA-seq data analysis

10,484 views
9,498 views

Published on

AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012

Published in: Technology
2 Comments
17 Likes
Statistics
Notes
  • Thank you for your sharing. I need RNA seq information for my thesis, would you mind sending your ppt to me?Elham_hosseini70@yahoo.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • very informative and easy to understand thank you for sharing it!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
10,484
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
2
Comments
2
Likes
17
Embeds 0
No embeds

No notes for slide

An introduction to RNA-seq data analysis

  1. 1. An introduction to RNA-seq RNA- data analysis Sonika Tyagi Australian Genome Research Facility1 August 2012
  2. 2. Outline• Transcriptomics using RNA-seq: Applications• Gene expression profiling workflows• Design Challenges
  3. 3. RNA sequencing (mRNA-seq or (mRNA- RNA- RNA-seq)“An experimental protocol that usesnext- generation sequencingtechnologies to sequence the RNAmolecules within a biological sample inan effort to determine the primarysequence and relative abundance ofeach RNA”
  4. 4. A typical RNA-seq experiment RNA- Library preparation and Sequencing Bioinformatics Analysis Nature Reviews Genetics, November 2008; doi:10.1038/nrg2484
  5. 5. RNA- RNA-seq Application• Allele specific expression: prevelance of transcribed SNPs• Fusion transcripts: e.g., in cancer• Abundance estimation: alternative splicing, RNA-editing, novel transcripts• Gene expression profiling
  6. 6. Raw sequences (fastqMy Answer: files) Quality control (QC) Spliced Read alignment Transcripts reconstruction Differential expression analysis Biology
  7. 7. Reference Available ? Annotated de novo transcriptomeAnnotated Genome Assembled/Predicted assembly transcriptome Reads mapping •De novo assemblyReads mapping •Reference assisted TranscriptsTranscripts reconstructionreconstruction Summarization a (by CDS, exon, gene, splice junctions ) Tables of counts (digital expression) Biology DE analysis RNA- RNA-seq workflows (GO/Pathways)
  8. 8. Raw sequences (fastq files)Quality control (QC)Spliced Read alignment Transcripts reconstructionDifferential expression analysis Biology
  9. 9. QC tools
  10. 10. Raw sequences (fastq files) Quality control (QC) Spliced Read alignment Transcripts reconstructionDifferential expression analysis Biology
  11. 11. Alignments /mapping splice junctions Unspliced read Examples: • Ideal for mapping reads against cDNA aligners • MAQ, Stampy, databases. ELAND • Splice junction/events • Seed methods • BWA, Bowtie are not picked up • Burrow wheel methods Spliced read Examples: • Novel splice junctions can be detected aligners • Tophat,Mapsplice, SpliceMap • Perform better for • Exon first polymorphic regions • Seed – Extend method • GSNAP, QPALMA, and aligning Elandv2e pseudogenes.
  12. 12. Raw sequences (fastq files) Quality control (QC)Spliced Read alignment Transcripts reconstructionDifferential expression analysis Biology
  13. 13. Transcriptsreconstruction Examples: Genome guided • G.mor.se (short reads), cufflinks and Scripture (for long reads) Examples: Genome • Transabyss, velvet+Oases, independent MIRA, cufflinks*
  14. 14. Genome guided transcriptome assembly
  15. 15. Genome guided transcriptome assembly doi:10.1038/nrg3068 doi:10.1038/nrg3068 Published online Martin J and Wang Z, Nat Rev Gen 2011
  16. 16. Raw sequences (fastq files) Quality control (QC)Spliced Read alignment Transcripts reconstruction Differentialexpression analysis Biology
  17. 17. Normalisation and DE Library size Examples: RPKM ERANGE, Cuffdiff FPKM edgeR , Myrna TMM Upper quartile Poisson GLM Examples: Negative DEGseq Myrna binomial edgeR, bayseq, Cuffdiff
  18. 18. Quantification andnormalisation1. Digital expression or raw count: number of reads mapping to a region (exon/ transcript/novel region)2. Normalize counts* : number of reads per million reads per kb3. Splice junction detection4. Compare to existing gene models Nat Meth 2008 ; DOI:10.1038/NMETH.1226
  19. 19. Differential expression• Normalised gene expression value as RPKM: – reads per kilobase of exon model per million mapped reads• Or FPKM: – fragments per kilobase of exon model per million mapped reads• Compare RPKM/FPKM across conditions or tissues Nat Meth DOI:10.1038/NMETH.1226
  20. 20. Raw sequences (fastq files) Quality control (QC)Spliced Read alignment Transcripts reconstructionDifferential expression analysis Biology
  21. 21. System Biology: beyond the list of DE genes• Ontologies: GO enrichment, Goseq (R package)• DAVID (http://david.abcc.ncifcrf.gov)• Pathway analysis
  22. 22. RNA- RNA-seq experiment design challenges• NGS biases: – Libraryprep (GC content, 5’ or 3’ depletion, random hexamer primers, RNA species, bias towards 3’ end …). – Transcript length• Sequencing depth• Single or paired end• Biological or technical replicates• Validation BRIEFINGS IN BIOINFORMATICS. VOL 12. NO 3. 280^287
  23. 23. RNA- RNA-seq and othertranscriptomics methods Nature Reviews Genetics, November 2008; doi:10.1038/nrg2484
  24. 24. Summary• RNA-seq: more versatile, comprehensive with superior reproducibility and resolution.• Not dependent on prior sequence information: suitable for non-model organisms.• Potentially provides information for all RNA species in the cell and allows discovery of novel ones.• Still an actively developing fields and there are research areas which still need refinement.• Experimental design and validation gold standards to be set.
  25. 25. Tophat Cufflinks pipeline referenceDifferential gene and transcript expressionanalysis of RNA-seq experiments withTopHat and Cufflinks. Nat Protoc 7(3), 562-78. [article]
  26. 26. Differential gene and transcript expressionanalysis of RNA-seq experiments withTopHat and Cufflinks. Nat Protoc 7(3), 562-78. [article]
  27. 27. R-bioconductor based RNA-seq RNA- packages• edgeR• Voom• Deseqhttp://bioconductor.org/packages/release/BiocViews.html#___Software

×