SlideShare a Scribd company logo
1 of 55
RNA-seq: Generating Views of 
the Transcriptome 
Sean Davis, M.D., Ph.D. 
Genetics Branch, Center for Cancer Research 
National Cancer Institute 
National Institutes of Health
Normal 
Karyotype 
Tumor 
Karyotype
The Central Dogma
Patient and 
Population 
phenotype 
Characteristics 
Gene Copy 
Number 
Sequence 
Variation 
Chromatin 
Structure and 
Function 
Gene 
Expression 
Transcriptional 
Regulation 
DNA 
Methylation
Your Nature Paper
Overview 
• Quality Control 
• Alignment and Assembly 
• Transcript Quantification 
• Visualization 
• Differential Expression 
• Experimental Design 
10
RNA-seq protocol schematic
Approaches to RNA-seq 
Nature Biotech (2010) 28, 421-423
Quality Control 
• Specialized RNA-seq quality control software 
• Samples should be “similar” 
• No “absolute” cutoffs for good vs. bad samples 
• Visualize data in as many ways as necessary 
(browser, plots, sample similarity plots like 
MDS, etc.) 
14
Alignment
RNA-seq Alignment
From: https://research.fhcrc.org/mcintosh/en/tools.html 18
Transcript Quantification
Models for RNA-seq 
• Count-based models 
• Multi-reads (isoform resolution) 
• Paired-end reads (include length resolution 
step) 
• Positional bias along transcript length 
• Sequence bias
Read Counting
L. Pachter (2011) arXiv:1104.3889v
An Example of Sequencing Bias 
Hansen (2010), NAR
Sample-specific Sequence Bias
Transcript Quantification Models
Result of Quantification
Clustering and Visualization
Distance Metrics 
 Euclidean distance 
 Manhattan distance 
 Minkowski distance (generalized distance)
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Hierarchical Clustering 
Gene 1 
Gene 2 
Gene 3 
Gene 4 
Gene 5 
Gene 6 
Gene 7 
Gene 8
Differential Expression
MA Plot
DE Software Runtime
RNA-seq workflow as 
proposed by Anders et al. 
in Nature Protocols
Fusion Gene Detection
Fusion gene schematic
Fusion Detection
Other Applications 
• Alternative splicing 
• Isoform utilization 
• Functional annotation of genomic regions 
• Allele-specific expression 
• eQTL analysis 
• Classification problems (eg., cancer with 
unknown primary) 
• … 
51
Experimental Design 
• What are my goals? 
– Differential expression? 
– Transcriptome assembly? 
– Identify rare, novel trancripts? 
• System characteristics? 
– Large, expanded genome? 
– Intron/exon structures complex? 
– No reference genome or transcriptome
Experimental Design 
• Technical replicates 
– Probably not needed due to low technical variation 
• Biological replicates 
– Not explicitly needed for transcript assembly 
– Essential for differential expression analysis 
– Number of replicates often driven by sample 
availability for human studies 
– More is almost always better
Take Home Messages 
• Defining the experimental question(s) is 
critical 
• No gold-standard analysis workflows exist yet 
• Be aware that experimental biases present in 
nearly all -omics datasets 
• Biological replicates are almost always 
beneficial (necessary) 
• RPKM/FPKM are for human consumption, not 
computation (generally) 
54
Links of Interest 
• http://bioconductor.org 
• http://biostars.org 
• http://www.rna-seqblog.com/ 
• https://genome.ucsc.edu/ENCODE/ 
• http://www.ncbi.nlm.nih.gov/gds/

More Related Content

What's hot

What's hot (20)

ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Whole genome sequence
Whole genome sequenceWhole genome sequence
Whole genome sequence
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
dna microarray
dna microarraydna microarray
dna microarray
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
DNA Sequencing
DNA SequencingDNA Sequencing
DNA Sequencing
 
genomic comparison
genomic comparison genomic comparison
genomic comparison
 
Nanopore sequencing (NGS)
Nanopore sequencing (NGS)Nanopore sequencing (NGS)
Nanopore sequencing (NGS)
 
THIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptxTHIRD GEN SEQUENCING.pptx
THIRD GEN SEQUENCING.pptx
 
Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)Next Generation Sequencing (NGS)
Next Generation Sequencing (NGS)
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Ngs introduction
Ngs introductionNgs introduction
Ngs introduction
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
Comparative transcriptomics
Comparative transcriptomicsComparative transcriptomics
Comparative transcriptomics
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
Whole genome sequence.
Whole genome sequence.Whole genome sequence.
Whole genome sequence.
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 

Viewers also liked (10)

Analysis of ChIP-Seq Data
Analysis of ChIP-Seq DataAnalysis of ChIP-Seq Data
Analysis of ChIP-Seq Data
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Rna seq
Rna seq Rna seq
Rna seq
 
wings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualizewings2014 Workshop 1 Design, sequence, align, count, visualize
wings2014 Workshop 1 Design, sequence, align, count, visualize
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Malhotra20
Malhotra20Malhotra20
Malhotra20
 
Illumina Sequencing
Illumina SequencingIllumina Sequencing
Illumina Sequencing
 
Slide share
Slide shareSlide share
Slide share
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 

Similar to RNA-seq Data Analysis Overview

Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16
Reinhard Hiller
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
Elsa von Licy
 
Mar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working GroupMar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working Group
GenomeInABottle
 
Mar 2013 reference materials Selection
Mar 2013 reference materials SelectionMar 2013 reference materials Selection
Mar 2013 reference materials Selection
GenomeInABottle
 

Similar to RNA-seq Data Analysis Overview (20)

RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16
 
2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan2013 02-14 - ngs webinar - sellappan
2013 02-14 - ngs webinar - sellappan
 
QIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene PanelsQIAseq Targeted DNA, RNA and Fusion Gene Panels
QIAseq Targeted DNA, RNA and Fusion Gene Panels
 
Ngs webinar 2013
Ngs webinar 2013Ngs webinar 2013
Ngs webinar 2013
 
Mar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working GroupMar2013 Reference Material Selection Working Group
Mar2013 Reference Material Selection Working Group
 
Mar 2013 reference materials Selection
Mar 2013 reference materials SelectionMar 2013 reference materials Selection
Mar 2013 reference materials Selection
 
Bioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptxBioinformaatics for M.Sc. Biotecchnology.pptx
Bioinformaatics for M.Sc. Biotecchnology.pptx
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Día 19 - Noel Chen - Introducción a Novogene
Día 19 - Noel Chen - Introducción a Novogene Día 19 - Noel Chen - Introducción a Novogene
Día 19 - Noel Chen - Introducción a Novogene
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Analyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and VarcodeAnalyzing Genomic Data with PyEnsembl and Varcode
Analyzing Genomic Data with PyEnsembl and Varcode
 

More from Sean Davis

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 

More from Sean Davis (13)

Lightweight data engineering, tools, and software to facilitate data reuse an...
Lightweight data engineering, tools, and software to facilitate data reuse an...Lightweight data engineering, tools, and software to facilitate data reuse an...
Lightweight data engineering, tools, and software to facilitate data reuse an...
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
SRAdb Bioconductor Package Overview
SRAdb Bioconductor Package OverviewSRAdb Bioconductor Package Overview
SRAdb Bioconductor Package Overview
 
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor packageShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
ShinySRAdb: an R package using shiny to wrap the SRAdb Bioconductor package
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Forsharing cshl2011 sequencing
Forsharing cshl2011 sequencingForsharing cshl2011 sequencing
Forsharing cshl2011 sequencing
 
Public datatutorialoverview
Public datatutorialoverviewPublic datatutorialoverview
Public datatutorialoverview
 
Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Genetics Branch Journal club
Genetics Branch Journal clubGenetics Branch Journal club
Genetics Branch Journal club
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09Bioc strucvariant seattle_11_09
Bioc strucvariant seattle_11_09
 

Recently uploaded

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 

Recently uploaded (20)

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 

RNA-seq Data Analysis Overview

Editor's Notes

  1. I am going to spend a few minutes illustrating how existing and emerging high-throughput genomic technologies are being used to understand cancer, a mindnumbingly complex and disregulated biologic process.
  2. The first karyotypes were produced in 1956. Shown here is a comparison of a normal karyotype of a normal female and one from a tumor. By 1960, a karyotype of a cancer genome revealed the presence of the Philadelphia chromosome. Now known to represent the BCR-ABL fusion protein, it was not until 33 years later in 1993 that a drug, gleevec, become available that targeted the fusion product. By applying high-throughput microarray technologies, the Cancer Genetics Branch is striving to make observations of the cancer genome that will provide deeper understandings of the biology of cancer, to develop prognostic and diagnostic markers to improve patient-specific treatments, and to find promising targets for directed drug therapy.
  3. Since Knudson’s famous hypothesis proposing the two-hit model, our understanding of cancer as a genetic disease has progressed to the realization that cancer is not often a function of a single gene gone awry, but probably represents a complex interaction of multiple processes in the genome including altered copy number, gene expression, transcriptional regulation, chromatin modification, sequence variation, and DNA methylation. It is vital to the goal of producing better patient outcomes to understand not only what genes are involved in a certain type of cancer, but also how these other processes affect gene regulation. In short, an integrated view of the cancer genome is necessary and is now becoming possible.