Sarder Arifuzzaman
Student ID-2018210127
Date- 2018/10/12
1
Contents
 Introduction
 Results
 Discussion
 Conclusion
2
 Next-generation sequencing is rapidly becoming the method of choice for
transcriptional profiling experiments.
 In contrast to microarray technology, high throughput sequencing allows
identification of novel transcripts, does not require a sequenced genome
and circumvents background noise associated with fluorescence
quantification.
 Furthermore, unlike hybridization-based detection, RNA-seq allows genome-
wide analysis of transcription at single nucleotide resolution, including
identification of alternative splicing events and post-transcriptional RNA
editing events.
Introduction
3
Advantages and disadvantages of RNA-seq compared to microarrays
Introduction
4
A typical RNA-seq experiment
 Preparation of total RNA.
Depending on class of RNA to be sequenced (i.e. mRNA, lincRNA, microRNA etc),
enrichment is performed. Good quality total RNA is critical, although alternative
protocols for degraded RNA exist.
 Library preparation.
Library preparation consists of:
RNA fragmentation. Unlike short RNAs, mRNAs are typically fragmented to smaller
pieces of RNA to enable sequencing.
Reverse transcription. First and second strand cDNA is reverse transcribed from
fragmented RNA using random hexamers or oligo(dT) primers.
Adapter ligation. The 5’ and/or 3’ ends of cDNA are repaired and adapters (containing
sequences to allow hybridization to a flow cell) are ligated.
Library cleanup and amplification. Libraries are enriched for correctly ligated cDNA
fragments and amplified by PCR to add any remaining sequencing primer sequences.
Library quantification, quality control and sequencing. Library concentration is
assessed using qRT-PCR and/or Bioanalyzer and is ready for sequencing.
 Data analysis.
Downstream data analysis consists of quality control such as trimming of sequencing
adapters and removal of reads with poor quality scores followed by mapping reads,
analysis of differential expression, identification of novel transcripts and pathway
analysis.
Introduction
5
RNA-seq technologies
The three most widely used NGS platforms for RNA-seq are SOLiD and Ion Torrent,
both marketed by ThermoFisher, and Illumina’s HiSeq. All three platforms have similar
sample input requirements and sequences millions of cDNA fragments per run. Below,
sample preparation and pertinent application-specific advantages and disadvantages
are discussed.
Introduction
6
Illumina
Introduction
7
Ion Torrent and SOLiD libraries are both prepared using similar protocols.
Introduction
8
Non-coding RNA-seq
MicroRNAs are sequenced
by ligating RNA adapters to
each end of the mature
microRNA followed by
reverse transcription and
PCR (RT-PCR).
Introduction
9
Tool Current Version Uses/Functions
SAMtools 1.2 align, reconstruct, long_align, long_reconstruct, and editing
HISAT2 2.0.5 align
StringTie 1.3.3 reconstruct and diff
Salmon 0.8.0 quantify
Oases 0.2.09 assembly
Velvet 1.2.10 assembly
R with DESeq2, readr, and tximport libraries 3.3.2 diff, editing
featureCounts 1.5.0-p1 diff
LoRDEC 0.6 long_correct
STAR 2.5.2b long_align
IDP 0.1.9 long_reconstruct
IDP-fusion 1.1.1 long_fusion
GATK 3.5-0 variant and editing
Picard 2.2.2 variant
GIREMI 0.2.1 editing
gatb-core 1.1.0 editing
HTSlib 1.3 editing
FusionCatcher 0.99.5a beta fusion
bowtie 1.2.0 fusion
bowtie2 2.2.9 fusion, long_fusion
bwa 0.7.15 fusion
sra toolkit 2.8.1 fusion
coreutils 8.25 fusion
pigz 2.3.1 fusion
blat 0.35 fusion
faToTwoBit fusion
liftOver fusion
SeqTK 1.0-r82b fusion
gmap 2017-02-15 long_fusion
Introduction (Tools for data analysis)
10
Task Command Default Tool Output Files
Short-read alignment align HISAT2
alignments: alignments.sorted.bam
junctions: splicesites.tab
Short-read
transcriptome
reconstruction
reconstruct StringTie
trasncripts: transcripts.gtf
expressions: gene_abund.tab
Short-read
quantification
quantify Salmon-SMEM expressions: quant.sf
Short-read differential
expression
diff DESeq2 differential expressions: deseq2_res.tab
Short-read de novo
assembly
denovo Oases trasncripts: transcripts.fa
Long-read error
correction
long_correct LoRDEC corrected reads long_corrected.fa
Long-read alignment long_align STARlong alignments Aligned.out.psl
Long-read
transcriptome
reconstruction
long_reconstruct IDP
trasncripts: isoform.gtf
expressions: isoform.exp
Long-read fusion
detection
long_fusion IDP-fusion fusions: fusion_report.tsv
Variant calling variant GATK variants: variants_filtered.vcf
RNA editing detection editing GIREMI edits: giremi_out.txt.res
RNA Fusion detection fusion FusionCatcher fusions: final-list_candidate-fusion-genes.txt
Running all steps all whole pipeline all outputs of the successful steps.
The table below summarizes the output files generated by each tools
Introduction
11
 The popularity of high-throughput next-generation sequencing
(NGS) ushered a new era in transcriptome analysis with RNA-seq.
 A widespread application of RNAseq requires workflows tuned to
the sequencing technologies involved, sample types, desired
analysis as well as the availability of genomic and computational
resources.
 Depending on the workflow used, the accuracy, speed, and cost
of analysis can vary significantly.
 Thus, it is crucial to study the tradeoffs involved at different steps
of an RNA-seq analysis to get the best accuracy subject to the cost
and performance constraints.
 Furthermore, figuring out the optimal workflow is even more
challenging since, in general, the best overall approaches may
have sub-optimal performance for a specific data set in terms of a
specific measure, which necessitates a comprehensive analysis of
workflows using a wide variety of data sets.
Introduction
12
 They report the performance and propose a comprehensive
RNA-seq analysis protocol, named RNACocktail, along with a
computational pipeline achieving high accuracy.
 Validation on different samples reveals that their proposed
protocol could help researchers extract more biologically
relevant predictions by broad analysis of the transcriptome.
Introduction
13
Overall de-novo RNA-seq analysis flowchart
14
The current RNACocktail computational pipeline step for general-purpose RNA-seq analysis
15
Several efforts have made to compare the performance of different RNA-seq analysis
tools.
However, these studies have mostly focused on a single RNA-seq analysis step, or their
workflow analyses were limited to one or two steps such as alignment and
quantification.
Thus, a comprehensive and systematic analysis of the RNA-seq data from different
perspectives can contribute significantly toward extraction of maximal insights from
RNA-seq data.
Research question
16
The proposed RNACocktail analysis protocol
17
Comparative performances of the tools in isoform detection using short reads
18
Comparative efficiencies of the tools in reference-based transcript identification
19
Comparative performances in De novo transcript assembly
20
Comparative performances in Isoform detection using long reads
21
Comparative performances in differential gene expressions analysis
22
Comparative performances in different variant calling RNA editing, and RNA fusion detection
23
 In conclusion, this a comprehensive assessment with detailed investigation
at each analysis step clearly outlines the current state of the RNA-seq
analysis.
 This protocol highlights algorithm issues that warrant the attention of
researchers, leads to a broad-spectrum analysis protocol that can enable
researchers to unleash the full power of RNA-seq.
 They envision that this approach will facilitate researchers in gaining better
and more comprehensive biological insights from their transcriptomic data,
as exemplified by the results of our pipeline, which is only one possible
instantiation of the comprehensive protocol.
Conclusion
24
Thank You very much
25

Bioinformatics class ppt arifuzzaman

  • 1.
  • 2.
    Contents  Introduction  Results Discussion  Conclusion 2
  • 3.
     Next-generation sequencingis rapidly becoming the method of choice for transcriptional profiling experiments.  In contrast to microarray technology, high throughput sequencing allows identification of novel transcripts, does not require a sequenced genome and circumvents background noise associated with fluorescence quantification.  Furthermore, unlike hybridization-based detection, RNA-seq allows genome- wide analysis of transcription at single nucleotide resolution, including identification of alternative splicing events and post-transcriptional RNA editing events. Introduction 3
  • 4.
    Advantages and disadvantagesof RNA-seq compared to microarrays Introduction 4
  • 5.
    A typical RNA-seqexperiment  Preparation of total RNA. Depending on class of RNA to be sequenced (i.e. mRNA, lincRNA, microRNA etc), enrichment is performed. Good quality total RNA is critical, although alternative protocols for degraded RNA exist.  Library preparation. Library preparation consists of: RNA fragmentation. Unlike short RNAs, mRNAs are typically fragmented to smaller pieces of RNA to enable sequencing. Reverse transcription. First and second strand cDNA is reverse transcribed from fragmented RNA using random hexamers or oligo(dT) primers. Adapter ligation. The 5’ and/or 3’ ends of cDNA are repaired and adapters (containing sequences to allow hybridization to a flow cell) are ligated. Library cleanup and amplification. Libraries are enriched for correctly ligated cDNA fragments and amplified by PCR to add any remaining sequencing primer sequences. Library quantification, quality control and sequencing. Library concentration is assessed using qRT-PCR and/or Bioanalyzer and is ready for sequencing.  Data analysis. Downstream data analysis consists of quality control such as trimming of sequencing adapters and removal of reads with poor quality scores followed by mapping reads, analysis of differential expression, identification of novel transcripts and pathway analysis. Introduction 5
  • 6.
    RNA-seq technologies The threemost widely used NGS platforms for RNA-seq are SOLiD and Ion Torrent, both marketed by ThermoFisher, and Illumina’s HiSeq. All three platforms have similar sample input requirements and sequences millions of cDNA fragments per run. Below, sample preparation and pertinent application-specific advantages and disadvantages are discussed. Introduction 6
  • 7.
  • 8.
    Ion Torrent andSOLiD libraries are both prepared using similar protocols. Introduction 8
  • 9.
    Non-coding RNA-seq MicroRNAs aresequenced by ligating RNA adapters to each end of the mature microRNA followed by reverse transcription and PCR (RT-PCR). Introduction 9
  • 10.
    Tool Current VersionUses/Functions SAMtools 1.2 align, reconstruct, long_align, long_reconstruct, and editing HISAT2 2.0.5 align StringTie 1.3.3 reconstruct and diff Salmon 0.8.0 quantify Oases 0.2.09 assembly Velvet 1.2.10 assembly R with DESeq2, readr, and tximport libraries 3.3.2 diff, editing featureCounts 1.5.0-p1 diff LoRDEC 0.6 long_correct STAR 2.5.2b long_align IDP 0.1.9 long_reconstruct IDP-fusion 1.1.1 long_fusion GATK 3.5-0 variant and editing Picard 2.2.2 variant GIREMI 0.2.1 editing gatb-core 1.1.0 editing HTSlib 1.3 editing FusionCatcher 0.99.5a beta fusion bowtie 1.2.0 fusion bowtie2 2.2.9 fusion, long_fusion bwa 0.7.15 fusion sra toolkit 2.8.1 fusion coreutils 8.25 fusion pigz 2.3.1 fusion blat 0.35 fusion faToTwoBit fusion liftOver fusion SeqTK 1.0-r82b fusion gmap 2017-02-15 long_fusion Introduction (Tools for data analysis) 10
  • 11.
    Task Command DefaultTool Output Files Short-read alignment align HISAT2 alignments: alignments.sorted.bam junctions: splicesites.tab Short-read transcriptome reconstruction reconstruct StringTie trasncripts: transcripts.gtf expressions: gene_abund.tab Short-read quantification quantify Salmon-SMEM expressions: quant.sf Short-read differential expression diff DESeq2 differential expressions: deseq2_res.tab Short-read de novo assembly denovo Oases trasncripts: transcripts.fa Long-read error correction long_correct LoRDEC corrected reads long_corrected.fa Long-read alignment long_align STARlong alignments Aligned.out.psl Long-read transcriptome reconstruction long_reconstruct IDP trasncripts: isoform.gtf expressions: isoform.exp Long-read fusion detection long_fusion IDP-fusion fusions: fusion_report.tsv Variant calling variant GATK variants: variants_filtered.vcf RNA editing detection editing GIREMI edits: giremi_out.txt.res RNA Fusion detection fusion FusionCatcher fusions: final-list_candidate-fusion-genes.txt Running all steps all whole pipeline all outputs of the successful steps. The table below summarizes the output files generated by each tools Introduction 11
  • 12.
     The popularityof high-throughput next-generation sequencing (NGS) ushered a new era in transcriptome analysis with RNA-seq.  A widespread application of RNAseq requires workflows tuned to the sequencing technologies involved, sample types, desired analysis as well as the availability of genomic and computational resources.  Depending on the workflow used, the accuracy, speed, and cost of analysis can vary significantly.  Thus, it is crucial to study the tradeoffs involved at different steps of an RNA-seq analysis to get the best accuracy subject to the cost and performance constraints.  Furthermore, figuring out the optimal workflow is even more challenging since, in general, the best overall approaches may have sub-optimal performance for a specific data set in terms of a specific measure, which necessitates a comprehensive analysis of workflows using a wide variety of data sets. Introduction 12
  • 13.
     They reportthe performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy.  Validation on different samples reveals that their proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome. Introduction 13
  • 14.
    Overall de-novo RNA-seqanalysis flowchart 14
  • 15.
    The current RNACocktailcomputational pipeline step for general-purpose RNA-seq analysis 15
  • 16.
    Several efforts havemade to compare the performance of different RNA-seq analysis tools. However, these studies have mostly focused on a single RNA-seq analysis step, or their workflow analyses were limited to one or two steps such as alignment and quantification. Thus, a comprehensive and systematic analysis of the RNA-seq data from different perspectives can contribute significantly toward extraction of maximal insights from RNA-seq data. Research question 16
  • 17.
    The proposed RNACocktailanalysis protocol 17
  • 18.
    Comparative performances ofthe tools in isoform detection using short reads 18
  • 19.
    Comparative efficiencies ofthe tools in reference-based transcript identification 19
  • 20.
    Comparative performances inDe novo transcript assembly 20
  • 21.
    Comparative performances inIsoform detection using long reads 21
  • 22.
    Comparative performances indifferential gene expressions analysis 22
  • 23.
    Comparative performances indifferent variant calling RNA editing, and RNA fusion detection 23
  • 24.
     In conclusion,this a comprehensive assessment with detailed investigation at each analysis step clearly outlines the current state of the RNA-seq analysis.  This protocol highlights algorithm issues that warrant the attention of researchers, leads to a broad-spectrum analysis protocol that can enable researchers to unleash the full power of RNA-seq.  They envision that this approach will facilitate researchers in gaining better and more comprehensive biological insights from their transcriptomic data, as exemplified by the results of our pipeline, which is only one possible instantiation of the comprehensive protocol. Conclusion 24
  • 25.