SlideShare a Scribd company logo
1 of 18
RNA-Seq Analysis: Everything
You Always Wanted to
Know...and then some
Amit Sinha, PhD
amit@basepairtech.com
Founder: www.basepairtech.com
Instructor: Harvard Med School
Overview
1. Introduction
2. Analysis Steps
3. Computing options
4. Demo
5. Q n A
2
Why RNA-Seq
Unbiased estimation of transcription
Applications
Differential expression
Novel transcripts, splice junctions, gene fusions
Variants
3
Raw data after sequencing
• Fastq format
• For paired-end, 2 files
• 1-5 Gb each
@SOLEXA2:5:1:3:169#0/1
CCAAATAATAGTTTGTTTTTTTGATATCTATA
+
aa``_b__a`b`aaa`aaaaaaa_a_a`a`a`
@SOLEXA2:5:1:3:1063#0/1
CAGTTCTTAAAGCTCACAAAGATGGTTTGAAA
+
bbbaabbbbbababbbbaaaba```bbaaaa
@SOLEXA2:5:1:3:902#0/1
GGCATAACATATCTTCCAAATCCATGTATTTC
+
aabbbbbbbbbbbbbbbb`bbbbbabaabaab
@SOLEXA2:5:1:3:1072#0/1
AAGAAACAGAACTTGAATTTTCTTTAACTCAC
+
baaaaaaaaaaaaaa``aaaaaaaaaaa`aaa
4
Differential expression pipeline
5
Alignment Differential expression
Steps in RNA-Seq analysis
QC
• Check sequencing quality, no sample
contamination
Alignment
• Map short reads to the genome
Metrics
• Ensure your reads are mapping to coding
regions
Transcript
count
• Count number of reads on each transcript
Differential
expression
• Statistical significance of expression count
6
Tools for RNA-Seq analysis
QC
• FastQC, FASTX
Alignment
• Tophat, STAR
Metrics
• Picard, Cummerbund?
Transcript
count
• Rsamtools, htseq-count, featurecounts
Differential
expression
• Cuffdiff, DESeq, VOOM
7
QC
8
Alignment
9
Post-alignment QC
10
Read counts
11
Gene Count
TP53 456
BRCA1 897
HOXA9 765
Differential expression
12
Heatmap Volcano plot
Downstream analysis
-0.4 -0.2 0.0 0.2
-0.4-0.20.00.2
Principle Component 2
PrincipleComponent3
LSK
CMP
MEP
GMP
LGMP-GMP
LGMP-HSC
Geneset
Enrichment Clustering
(PCA)
Network
Analysis
14
Options for analysis
Desktop Local server Web-based
High control
Low computing power
Small storage space
Can’t handle large loads
Computing and storage
Low cost
Need tech expertise
Quota limits
Scalable
Cost-effective
Higher uptime
No direct access
Name
File
Run
15
Analysis scripts for the server
16
Upload Data
Web-based software
17
1 Choose workflow2 Run analysis3
Free trial for webinar attendees
18
https://www.basepairtech.com
code: webinar-101
Questions?

More Related Content

What's hot

What's hot (20)

ChIP-seq - Data processing
ChIP-seq - Data processingChIP-seq - Data processing
ChIP-seq - Data processing
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
Part 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expressionPart 5 of RNA-seq for DE analysis: Detecting differential expression
Part 5 of RNA-seq for DE analysis: Detecting differential expression
 
Discovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGSDiscovery and annotation of variants by exome analysis using NGS
Discovery and annotation of variants by exome analysis using NGS
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
Digital RNAseq for Gene Expression Profiling: Digital RNAseq Webinar Part 2
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Transcriptome project
Transcriptome projectTranscriptome project
Transcriptome project
 
Long-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technologyLong-read: assets and challenges of a (not so) emerging technology
Long-read: assets and challenges of a (not so) emerging technology
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 
Use of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay DesignUse of NCBI Databases in qPCR Assay Design
Use of NCBI Databases in qPCR Assay Design
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
rhAmp™ SNP Genotyping: A novel approach for improving PCR-based SNP genotyping
rhAmp™ SNP Genotyping: A novel approach for improving PCR-based SNP genotypingrhAmp™ SNP Genotyping: A novel approach for improving PCR-based SNP genotyping
rhAmp™ SNP Genotyping: A novel approach for improving PCR-based SNP genotyping
 
Macs course
Macs courseMacs course
Macs course
 
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
RNase H2 PCR—A New Technology to Reduce Primer Dimers and Increase Genotyping...
 
How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)How to cluster and sequence an ngs library (james hadfield160416)
How to cluster and sequence an ngs library (james hadfield160416)
 
Introducing data analysis: reads to results
Introducing data analysis: reads to resultsIntroducing data analysis: reads to results
Introducing data analysis: reads to results
 
A practical approach to assay design for qPCR
A practical approach to assay design for qPCRA practical approach to assay design for qPCR
A practical approach to assay design for qPCR
 

Similar to RNA-Seq Analysis: Everything You Always Wanted to Know...and then some

Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Lucidworks
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
Francisco Garc
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Jim Clausing
 

Similar to RNA-Seq Analysis: Everything You Always Wanted to Know...and then some (20)

20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
Seqr - Protein Sequence Search: Presented by Lianyi Han, Medical Science & Co...
 
rnaseq_from_babelomics
rnaseq_from_babelomicsrnaseq_from_babelomics
rnaseq_from_babelomics
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
From Zero to Nextflow 2017
From Zero to Nextflow 2017From Zero to Nextflow 2017
From Zero to Nextflow 2017
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
PCR Array Data Analysis Tutorial: qPCR Technology Webinar Series Part 3
 
NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
FPGAを用いた処理のロボット向けコンポーネントの設計生産性評価
 
8051 micro controller
8051 micro controller8051 micro controller
8051 micro controller
 
XSEDE15_PhastaGateway
XSEDE15_PhastaGatewayXSEDE15_PhastaGateway
XSEDE15_PhastaGateway
 
Unveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep DiveUnveiling etcd: Architecture and Source Code Deep Dive
Unveiling etcd: Architecture and Source Code Deep Dive
 
Insight Data Engineering - Demo
Insight Data Engineering - DemoInsight Data Engineering - Demo
Insight Data Engineering - Demo
 
Reaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable worldReaching reliable agreement in an unreliable world
Reaching reliable agreement in an unreliable world
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016Pasteur deep seq_analysis_theory_2016
Pasteur deep seq_analysis_theory_2016
 
Assignment-2 -upload.pptx
Assignment-2 -upload.pptxAssignment-2 -upload.pptx
Assignment-2 -upload.pptx
 
Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...Building an Automated Behavioral Malware Analysis Environment using Free and ...
Building an Automated Behavioral Malware Analysis Environment using Free and ...
 

Recently uploaded

The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
RaunakRastogi4
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 

Recently uploaded (20)

Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 

RNA-Seq Analysis: Everything You Always Wanted to Know...and then some

  • 1. RNA-Seq Analysis: Everything You Always Wanted to Know...and then some Amit Sinha, PhD amit@basepairtech.com Founder: www.basepairtech.com Instructor: Harvard Med School
  • 2. Overview 1. Introduction 2. Analysis Steps 3. Computing options 4. Demo 5. Q n A 2
  • 3. Why RNA-Seq Unbiased estimation of transcription Applications Differential expression Novel transcripts, splice junctions, gene fusions Variants 3
  • 4. Raw data after sequencing • Fastq format • For paired-end, 2 files • 1-5 Gb each @SOLEXA2:5:1:3:169#0/1 CCAAATAATAGTTTGTTTTTTTGATATCTATA + aa``_b__a`b`aaa`aaaaaaa_a_a`a`a` @SOLEXA2:5:1:3:1063#0/1 CAGTTCTTAAAGCTCACAAAGATGGTTTGAAA + bbbaabbbbbababbbbaaaba```bbaaaa @SOLEXA2:5:1:3:902#0/1 GGCATAACATATCTTCCAAATCCATGTATTTC + aabbbbbbbbbbbbbbbb`bbbbbabaabaab @SOLEXA2:5:1:3:1072#0/1 AAGAAACAGAACTTGAATTTTCTTTAACTCAC + baaaaaaaaaaaaaa``aaaaaaaaaaa`aaa 4
  • 6. Steps in RNA-Seq analysis QC • Check sequencing quality, no sample contamination Alignment • Map short reads to the genome Metrics • Ensure your reads are mapping to coding regions Transcript count • Count number of reads on each transcript Differential expression • Statistical significance of expression count 6
  • 7. Tools for RNA-Seq analysis QC • FastQC, FASTX Alignment • Tophat, STAR Metrics • Picard, Cummerbund? Transcript count • Rsamtools, htseq-count, featurecounts Differential expression • Cuffdiff, DESeq, VOOM 7
  • 11. Read counts 11 Gene Count TP53 456 BRCA1 897 HOXA9 765
  • 13. Downstream analysis -0.4 -0.2 0.0 0.2 -0.4-0.20.00.2 Principle Component 2 PrincipleComponent3 LSK CMP MEP GMP LGMP-GMP LGMP-HSC Geneset Enrichment Clustering (PCA) Network Analysis 14
  • 14. Options for analysis Desktop Local server Web-based High control Low computing power Small storage space Can’t handle large loads Computing and storage Low cost Need tech expertise Quota limits Scalable Cost-effective Higher uptime No direct access Name File Run 15
  • 15. Analysis scripts for the server 16
  • 16. Upload Data Web-based software 17 1 Choose workflow2 Run analysis3
  • 17. Free trial for webinar attendees 18 https://www.basepairtech.com code: webinar-101

Editor's Notes

  1. If you have 10 samples in your project, you are looking at 50 Gigabytes of data. How do you go from here to differentially expressed genes?