SlideShare a Scribd company logo
RNA-Seq
RNA-Seq
• Application of Next Generation Sequencing technology
(NGS) for RNA sequencing for transcript identification
and quantification of RNA.
• Can be used for:
– Estimating the number of transcripts in the sample
(transcriptomics or expression profiling)
– Reveal sequence variation
– Detection of alternate splicing
– Gene expression profiles of healthy versus diseased tissue
RNA-Seq vs Microarray
BMC Bioinformatics201415(Suppl 11):S3, DOI: 10.1186/1471-2105-15-S11-S3
Data Generation Steps REVI EWS
Nature Reviews Genetics 12, 671-682 (October 2011) , Doi:10.1038/nrg3068
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Read-Mapping Challenges
• NGS Computational challenges
• Memory footprint
• Millions of short reads
• RNA-Seq Special Mapping Concerns
• New technology old problems
• Exact vs inexact matches
From wikipedia
Algorithms For Read Mapping
Build an Index
Set of position where reads are most likely to align
Refined alignment at the target locations
- Hash table
- Burrow-Wheeler
transform (BWT); FM
Index
Seed and Extend
Hash Tables
• Use hash tables to store position of all k-mers
in a genome
1 2
012345678901234567890
AATCGCATAG
ATCGCATAGT
TCGCATAGTT
CGCATAGTTA
GCATAGTTA T
- Chr 9, location 0
- Chr 9, location 1
- Chr 9, location 2
- Chr 9, location 3
- Chr 9, location 4
- Chr 9, location 5
AATCGCATAGTTATTAATGCTA
Output String: TTGGAACC
Input String: GCTAGCTA
GCTAGCTA
CTAGCTAG
TAGCTAGC
AGCTAGCT
GCTAGCTA
CTAGCTAG
TAGCTAGC
AGCTAGCT
AGCTAGCT
AGCTAGCT
CTAGCTAG
CTAGCTAG
GCTAGCTA
GCTAGCTA
TAGCTAGC
TAGCTAGC
Sorting
Burrows-Wheeler Transformation
BWT
• Reversible transformation
• Repetitive nature of the
outcome makes it easier to
compress
Seed and Extend
Read Target
ATGCTAGT ATGCTGTT
ATGCTAGT
Mis-match
Match
RNA-Seq: Special Mapping Concerns
www.ensembl.org
RNA-Seq: Special Mapping Concerns
genome.gov
Alternate Splicing
RNA-Seq: Special Mapping Concerns
• For RNA sequencing data, many reads will map to the reference
genome, but many reads will not because (coming from RNA) they
span exon–exon junctions.
• Methods to deal with junction reads
• Align to the reference transcriptome (well annotated).
• Align to the reference genome and build a junction library
from known adjacent exons and then align unmapped reads to
junction library
• Map reads to the genome and identify putative exon (indel
finding algorithm); using these candidate exon build all
possible exon-exon junctions
• De novo assembly of RNA-Seq reads
RNA-Seq: Special Mapping Concerns
Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
Reference Based Mapping Methods
BMC Genomics. 2014; 15(1): 570, Doi: 10.1186/1471-2164-15-570
Tophat2
Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
Transcript Assembly
IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct; 10(5): 1234–1240.
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Summarizing Reads
• Aggregate reads over biological meaningful units such as transcripts or
genes
• Count the number of reads overlapping exons in a gene (but significant
proportion of the reads will also map outside annotated regions
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Count Normalization
• Number of reads aligned to a gene gives a measure of
its level of expression
• Normalization of the count data
• Sequencing depth
• Length bias
o decide
rom the
require-
h assem-
ut differ
ufflinks b
Isoform 1
d
a
Low
Short transcript
High
Long transcript
Readcount
21
43
1 2 3 4
Exon unio104
Nature Methods 8, 469–477 (2011), Doi:10.1038/nmeth.1613
Count Normalization
• RPKM (Reads Per Kilobase of exon model per Million mapped reads)
• FPKM (Fragments Per Kilobase of exon model per Million mapped reads
• TPM (Transcripts per million)
Exon length
Raw number of reads
Number of mapped reads in the sample
1,000,000
RPKM =
Count Normalization
Gene/Transcript Name R1 counts R2 counts
A (50 kb) 37000 70000
B (100 kb) 50000 110000
C (200 kb) 50000 88000
D (-- kb) ---- ----
XDD (-- kb) ---- -----
Total number of reads 2000000 4000000
RPKM Calculation
RNA-Seq analysis Pipeline for Detecting
Differential Expression
Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
Differential Expression
• Goal of the DE analysis is to identify the genes
for which abundance across different
experimental conditions has changed
significantly
• Biological replicates (to account for biological
variation)
• Ranked list of genes with associated p-values
and fold changes
• DE tools: edgeR, DESeq
Alignment Independent Quantification
• Sailfish
• Salmon
• Kallisto
Main Idea
• Quantify the abundance of known transcripts
• Read mapping is unnecessary
• Replace inexact pattern matching with exact sub-pattern counting
Sailfish
Nature Biotechnology 32, 462–464 (2014), Doi:10.1038/nbt.2862
Transcript: TACGTACTAGACCTAA….....
Read: TGCGTACTAGCCCT
K-mers are Robust to Errors
Kallisto
arXiv:1505.02710v2 [q-bio.QM]

More Related Content

What's hot

NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
Bioinformatics and Computational Biosciences Branch
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
sworna kumari chithiraivelu
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
lemberger
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
ishi tandon
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
Jyoti Singh
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
Mazhar Khan
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
mikaelhuss
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
Afra Fathima
 
Biological databases
Biological databasesBiological databases
Biological databases
Sucheta Tripathy
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
Sean Davis
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
Sean Davis
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
Nusrat Gulbarga
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
University of California, Davis
 
String.pptx
String.pptxString.pptx
String.pptx
RitikaChoudhary57
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipmentsKalaivani P
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
CharupriyaChauhan1
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
jyosthsnakattula
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
Junsu Ko
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
Sukhjinder Singh
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
PILLAI ASWATHY VISWANATH
 

What's hot (20)

NGS: Mapping and de novo assembly
NGS: Mapping and de novo assemblyNGS: Mapping and de novo assembly
NGS: Mapping and de novo assembly
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
RNA-seq quality control and pre-processing
RNA-seq quality control and pre-processingRNA-seq quality control and pre-processing
RNA-seq quality control and pre-processing
 
RNA structure analysis
RNA structure analysis RNA structure analysis
RNA structure analysis
 
Biological databases
Biological databasesBiological databases
Biological databases
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Gene prediction method
Gene prediction method Gene prediction method
Gene prediction method
 
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression AnalysisSo you want to do a: RNAseq experiment, Differential Gene Expression Analysis
So you want to do a: RNAseq experiment, Differential Gene Expression Analysis
 
String.pptx
String.pptxString.pptx
String.pptx
 
New generation sequencing equipments
New generation sequencing equipmentsNew generation sequencing equipments
New generation sequencing equipments
 
Transcriptomics approaches
Transcriptomics approachesTranscriptomics approaches
Transcriptomics approaches
 
Genomic Databases-.pptx
Genomic Databases-.pptxGenomic Databases-.pptx
Genomic Databases-.pptx
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 

Similar to RNASeq - Analysis Pipeline for Differential Expression

rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
Pushpendra83
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
Alireza Doustmohammadi
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
HAMNAHAMNA8
 
Cufflinks
CufflinksCufflinks
Cufflinks
Ravi Gandham
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
cursoNGS
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
Sucheta Tripathy
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014LutzFr
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GenomeInABottle
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
External RNA Controls Consortium
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
Yaoyu Wang
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
GenomeInABottle
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Golden Helix Inc
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencing
Dynah Perry
 
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...Raunak Shrestha
 
Improved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsImproved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay Designs
Thermo Fisher Scientific
 
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqHimanshu Sethi
 

Similar to RNASeq - Analysis Pipeline for Differential Expression (20)

rnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdfrnaseq2015-02-18-170327193409.pdf
rnaseq2015-02-18-170327193409.pdf
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Processing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing DataProcessing Raw scRNA-Seq Sequencing Data
Processing Raw scRNA-Seq Sequencing Data
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
Cufflinks
CufflinksCufflinks
Cufflinks
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014Bioinformatics workshop Sept 2014
Bioinformatics workshop Sept 2014
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop20140711 4 e_tseng_ercc2.0_workshop
20140711 4 e_tseng_ercc2.0_workshop
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVSExploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
 
Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2Biotech autumn2012-02-ngs2
Biotech autumn2012-02-ngs2
 
Bacterial rna sequencing
Bacterial rna sequencingBacterial rna sequencing
Bacterial rna sequencing
 
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
DNA barcode sequence identification incorporating taxonomic hierarchy and wit...
 
Improved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay DesignsImproved Algorithm for Amplicon Sequencing Assay Designs
Improved Algorithm for Amplicon Sequencing Assay Designs
 
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-SeqNUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
NUGEN-X-Gen_2011_poster_trancriptome_sequencing_RNA-Seq
 

Recently uploaded

TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 

Recently uploaded (20)

TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 

RNASeq - Analysis Pipeline for Differential Expression

  • 2. RNA-Seq • Application of Next Generation Sequencing technology (NGS) for RNA sequencing for transcript identification and quantification of RNA. • Can be used for: – Estimating the number of transcripts in the sample (transcriptomics or expression profiling) – Reveal sequence variation – Detection of alternate splicing – Gene expression profiles of healthy versus diseased tissue
  • 3. RNA-Seq vs Microarray BMC Bioinformatics201415(Suppl 11):S3, DOI: 10.1186/1471-2105-15-S11-S3
  • 4. Data Generation Steps REVI EWS Nature Reviews Genetics 12, 671-682 (October 2011) , Doi:10.1038/nrg3068
  • 5. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 6. Read-Mapping Challenges • NGS Computational challenges • Memory footprint • Millions of short reads • RNA-Seq Special Mapping Concerns • New technology old problems • Exact vs inexact matches From wikipedia
  • 7. Algorithms For Read Mapping Build an Index Set of position where reads are most likely to align Refined alignment at the target locations - Hash table - Burrow-Wheeler transform (BWT); FM Index Seed and Extend
  • 8. Hash Tables • Use hash tables to store position of all k-mers in a genome 1 2 012345678901234567890 AATCGCATAG ATCGCATAGT TCGCATAGTT CGCATAGTTA GCATAGTTA T - Chr 9, location 0 - Chr 9, location 1 - Chr 9, location 2 - Chr 9, location 3 - Chr 9, location 4 - Chr 9, location 5 AATCGCATAGTTATTAATGCTA
  • 9. Output String: TTGGAACC Input String: GCTAGCTA GCTAGCTA CTAGCTAG TAGCTAGC AGCTAGCT GCTAGCTA CTAGCTAG TAGCTAGC AGCTAGCT AGCTAGCT AGCTAGCT CTAGCTAG CTAGCTAG GCTAGCTA GCTAGCTA TAGCTAGC TAGCTAGC Sorting Burrows-Wheeler Transformation BWT • Reversible transformation • Repetitive nature of the outcome makes it easier to compress
  • 10. Seed and Extend Read Target ATGCTAGT ATGCTGTT ATGCTAGT Mis-match Match
  • 11. RNA-Seq: Special Mapping Concerns www.ensembl.org
  • 12. RNA-Seq: Special Mapping Concerns genome.gov Alternate Splicing
  • 13. RNA-Seq: Special Mapping Concerns • For RNA sequencing data, many reads will map to the reference genome, but many reads will not because (coming from RNA) they span exon–exon junctions. • Methods to deal with junction reads • Align to the reference transcriptome (well annotated). • Align to the reference genome and build a junction library from known adjacent exons and then align unmapped reads to junction library • Map reads to the genome and identify putative exon (indel finding algorithm); using these candidate exon build all possible exon-exon junctions • De novo assembly of RNA-Seq reads
  • 14. RNA-Seq: Special Mapping Concerns Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
  • 15. Reference Based Mapping Methods BMC Genomics. 2014; 15(1): 570, Doi: 10.1186/1471-2164-15-570
  • 16. Tophat2 Genome Biology 2013 14:R36, DOI: 10.1186/gb-2013-14-4-r36
  • 17. Transcript Assembly IEEE/ACM Trans Comput Biol Bioinform. 2013 Sep-Oct; 10(5): 1234–1240.
  • 18. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 19. Summarizing Reads • Aggregate reads over biological meaningful units such as transcripts or genes • Count the number of reads overlapping exons in a gene (but significant proportion of the reads will also map outside annotated regions Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 20. Count Normalization • Number of reads aligned to a gene gives a measure of its level of expression • Normalization of the count data • Sequencing depth • Length bias o decide rom the require- h assem- ut differ ufflinks b Isoform 1 d a Low Short transcript High Long transcript Readcount 21 43 1 2 3 4 Exon unio104 Nature Methods 8, 469–477 (2011), Doi:10.1038/nmeth.1613
  • 21. Count Normalization • RPKM (Reads Per Kilobase of exon model per Million mapped reads) • FPKM (Fragments Per Kilobase of exon model per Million mapped reads • TPM (Transcripts per million) Exon length Raw number of reads Number of mapped reads in the sample 1,000,000 RPKM =
  • 22. Count Normalization Gene/Transcript Name R1 counts R2 counts A (50 kb) 37000 70000 B (100 kb) 50000 110000 C (200 kb) 50000 88000 D (-- kb) ---- ---- XDD (-- kb) ---- ----- Total number of reads 2000000 4000000
  • 24. RNA-Seq analysis Pipeline for Detecting Differential Expression Genome Biology 2010 11:220, DOI: 10.1186/gb-2010-11-12-220
  • 25. Differential Expression • Goal of the DE analysis is to identify the genes for which abundance across different experimental conditions has changed significantly • Biological replicates (to account for biological variation) • Ranked list of genes with associated p-values and fold changes • DE tools: edgeR, DESeq
  • 26. Alignment Independent Quantification • Sailfish • Salmon • Kallisto Main Idea • Quantify the abundance of known transcripts • Read mapping is unnecessary • Replace inexact pattern matching with exact sub-pattern counting
  • 27. Sailfish Nature Biotechnology 32, 462–464 (2014), Doi:10.1038/nbt.2862

Editor's Notes

  1. Accurate maps of transcript start and end site Detect sequence rearrangements and abnormal transcript structures (common in tumours) It reflects the current state of the cell and can reveal pathological mechanism
  2. In the past techniques such as microarray were used to study gene expression. It consists of array of probes whose sequence represents particular regions of the genes to be monitored. But there were several limitations High background levels due to cross hybridization Reliance on prior knowledge about the genome On the other had signal from RNA-Seq data is digital in nature because you get the counts. It has base-pair level resolution and a much higher dynamic range of expression levels. We can find novel transcripts and fusion products.
  3. Extraction of the RNA Remove contaminant DNA If the goal of the experiment is expression profiling then polyA selection for enriching mRNA in eukaryotes, will miss non-coding RNAs and RNAs that miss polyA tails. So if Other library preparation is to deplete rRNA Library preparation can introduce biases such as amplification of GC-rich regions and generation of duplicate sequence
  4. Pattern searching and data compression are old computational problems. Exact matches are very quick but inexact matches(SW algothrim) taking into account the snps/indels are very slow.
  5. First build an index and find the most probable sites where reads can match. Then at these putative sites (narrowed down) do local alignment.
  6. Reads are coming from the mRNA and we are trying to match them to the genome.
  7. Splicing is post-transcriptional modification in which non-coding regions are removed. Many transcripts will share exon
  8. Transcriptomes are incomplete even for well studied species
  9. In the first step of the alignment you can start by aligning reads to either to the reference genome or to the transcriptome. Alinging to the transcriptome is a new feature in tophat2. It improves overall accuracy and sensitivity of the mapping. It also speeds up the analysis as due to smaller size of the transcriptome. Some of the reads will not be mapped because they are coming form unknown transcripts not present in the annotation and there will also be poorly aligned reads. So the next step is to take these unmapped reads and to find novel splice sites. The way tophat2 does it is by splitting the unmapped reads into non-overlapping segments 25 bp long by default and then these segments are aligned against the genome. The maximum intron size is 100 kb by default and that is the window in which we are looking for the match of left and right segments. When that pattern is detected then tophat2 tries to find the most likely location of the splice sites. After detecting the splice juction, tophat2 puts together based on known junction signals (GT-AG, GC-AG and AT-AC).
  10. Overview of RNA-seq analysis. Reads produced by an RNA-seq experiment are aligned to the genome, then clustered into a graph structure that is traversed to recover all possible isoforms at one locus. Lastly, a subset of transcripts is selected and their abundance quantified from the input reads.
  11. Number of reads aligned gives a measure of the level of expression
  12. Cell type specific exon
  13. Let A and B being two RNA-seq experiments under same condtions by that I mean no differentially expressed genes. If experiment A generates twice as many reads as much reads as B, it is likely that counts from the experiment A will be doubled Length bias: expected number of reads mapped on a gene is proportional to both the abundance and length of the isoforms transcribed from the that gene Adjust for the sequencing depth (“Million” part) Adjust for the Gene length (“kilobase” part) Sequencign depth of a sample second experiment generates twice as many reads
  14. Read with errors still has has many ‘good’ k-mers Only k-mers overlapping errors will be discarded or mis-counted