Transcript reconstruction algorithms available in the
Trinity RNA-Seq package
Daniel Standage
Brendel Group, Indiana Unive...
Introduction

RNA-Seq

RNA-Seq

Examination of transcriptomes
deep
effective
affordable

Daniel Standage (Brendel Group @ IU...
Introduction

RNA-Seq

RNA-Seq

High throughput comes at the expense of
contiguity.

Daniel Standage (Brendel Group @ IU)
...
Introduction

RNA-Seq

RNA-Seq

High throughput comes at the expense of
contiguity...well, at least for now.

Daniel Stand...
Introduction

Assembly with Trinity

Transcriptome assembly

In the absence of full-length transcript sequences,
reconstru...
Introduction

Assembly with Trinity

Trinity RNA-Seq

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

...
Introduction

Assembly with Trinity

Trinity RNA-Seq

Now with 3 transcript reconstruction modes!
Butterfly (default)
--Pas...
Introduction

Assembly with Trinity

Review outline

Trinity algorithm
PASA algorithm
Cufflinks algorithm
Discussion

Daniel...
Trinity

Inchworm

Step 1: Inchworm

Assemble unique contigs representing transcript
subsequences.
Often produces dominant...
Trinity

Inchworm

Inchworm procedure

1

Create dictionary of k-mers (k = 25)

2

Remove k-mers containing probable error...
Trinity

Chrysalis

Step 2: Chrysalis

Group Inchworm contigs, construct de Bruijn
graph for each cluster.
Each connected ...
Trinity

Chrysalis

Chrysalis procedure

1

Group contigs if they share perfect overlap of k − 1 bp (with reads
supporting...
Trinity

Butterfly

Step 3: Butterfly

Traverse read-supported paths in each subgraph,
enumerate plausible sequences.

Danie...
Trinity

Butterfly

Butterfly procedure

1

2

Graph simplification: merge consecutive nodes in linear paths,
pruning minor d...
PASA

PASA

Program to Assemble Spliced Alignments
designed for ESTs and FL-cDNAs (pre-NGS era)
works on sequence alignmen...
PASA

PASA algorithm

Input: a set of spliced cDNA alignments A
Output: for each alignment a ∈ A, the largest assembly con...
PASA

PASA algorithm

Recurrences
La = max{Ca , Lb + Ca/b }
b

Ra = max{Ca , Rb + Ca/b }
b

La , Ra : maximum number of cD...
PASA

PASA algorithm

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

18 / 24
Cufflinks

Cufflinks

designed for short transcript reads (NGS era)
works on read alignments (mappings)
identifies fewest numbe...
Cufflinks

Cufflinks algorithm
Input: overlap graph G of mapped reads
Output: a minimal path cover of G , with each path corre...
Discussion

Three different construction approaches

Butterfly: enumerate all plausible transcripts with minimal read
suppor...
Discussion

Next time: comparison of 8 Trinity assemblies

Four assembly settings
Butterfly
--PasaFly
--CuffFly
Butterfly, -...
Discussion

Next time: comparison of 8 Trinity assemblies

Hypotheses

(transcripts per assembly)

Butterfly > PasaFly > Cu...
Discussion

Thank you!

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

24 / 24
Upcoming SlideShare
Loading in...5
×

Brendel Group Presentation: 4 Mar 2013

348

Published on

A review of the 3 transcript reconstruction modes available in the Trinity RNA-Seq package.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
348
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Brendel Group Presentation: 4 Mar 2013

  1. 1. Transcript reconstruction algorithms available in the Trinity RNA-Seq package Daniel Standage Brendel Group, Indiana University 4 Mar 2014 Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 1 / 24
  2. 2. Introduction RNA-Seq RNA-Seq Examination of transcriptomes deep effective affordable Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 2 / 24
  3. 3. Introduction RNA-Seq RNA-Seq High throughput comes at the expense of contiguity. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 3 / 24
  4. 4. Introduction RNA-Seq RNA-Seq High throughput comes at the expense of contiguity...well, at least for now. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 4 / 24
  5. 5. Introduction Assembly with Trinity Transcriptome assembly In the absence of full-length transcript sequences, reconstruct full-length sequences from fragments. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 5 / 24
  6. 6. Introduction Assembly with Trinity Trinity RNA-Seq Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 6 / 24
  7. 7. Introduction Assembly with Trinity Trinity RNA-Seq Now with 3 transcript reconstruction modes! Butterfly (default) --PasaFly --CuffFly Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 7 / 24
  8. 8. Introduction Assembly with Trinity Review outline Trinity algorithm PASA algorithm Cufflinks algorithm Discussion Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 8 / 24
  9. 9. Trinity Inchworm Step 1: Inchworm Assemble unique contigs representing transcript subsequences. Often produces dominant isoform in full length, and then just unique portions of alternative isoforms. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 9 / 24
  10. 10. Trinity Inchworm Inchworm procedure 1 Create dictionary of k-mers (k = 25) 2 Remove k-mers containing probable errors (based on coverage?) 3 Selects highest occurring k-mer 4 Build contig by extending k-mer (find highest occurring k-mer with k − 1 bp overlap, extend 1 bp), remove k-mer from dictionary 5 Repeat previous step until the contig cannot be extended further, report contig 6 Repeat steps 3-5 until all k-mers are exhausted Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 10 / 24
  11. 11. Trinity Chrysalis Step 2: Chrysalis Group Inchworm contigs, construct de Bruijn graph for each cluster. Each connected component of the graph corresponds to one or more genes with shared sequence. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 11 / 24
  12. 12. Trinity Chrysalis Chrysalis procedure 1 Group contigs if they share perfect overlap of k − 1 bp (with reads supporting the overlap) 2 Build de Bruijn graph with k − 1 word size for nodes, k for edges; edges weighted by supporting reads 3 Assign each read to component with which it shares the largest number of k-mers Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 12 / 24
  13. 13. Trinity Butterfly Step 3: Butterfly Traverse read-supported paths in each subgraph, enumerate plausible sequences. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 13 / 24
  14. 14. Trinity Butterfly Butterfly procedure 1 2 Graph simplification: merge consecutive nodes in linear paths, pruning minor deviations Plausible path scoring: identify paths in graph with read support Initialize DP table with source nodes (no incoming edges) Fill in table by extending path prefixes by one node Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 14 / 24
  15. 15. PASA PASA Program to Assemble Spliced Alignments designed for ESTs and FL-cDNAs (pre-NGS era) works on sequence alignments computes consensus spliced alignments Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 15 / 24
  16. 16. PASA PASA algorithm Input: a set of spliced cDNA alignments A Output: for each alignment a ∈ A, the largest assembly containing a 1 Sort alignments 2 Test overlapping alignments for compatibility 3 Build DP table, backtrace to find maximal assembly A∗ 4 If ∃a ∈ A∗ , build reciprocal DP table, trace to enumerate additional / assemblies Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 16 / 24
  17. 17. PASA PASA algorithm Recurrences La = max{Ca , Lb + Ca/b } b Ra = max{Ca , Rb + Ca/b } b La , Ra : maximum number of cDNAs in an assembly that contains alignment a, starting from left and right (respectively) Ca : number of a-compatible alignments in the span of a Ca/b : number of a-compatible alignments in the span of a but not in the span of b Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 17 / 24
  18. 18. PASA PASA algorithm Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 18 / 24
  19. 19. Cufflinks Cufflinks designed for short transcript reads (NGS era) works on read alignments (mappings) identifies fewest number of transcripts that “explain” the read mappings Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 19 / 24
  20. 20. Cufflinks Cufflinks algorithm Input: overlap graph G of mapped reads Output: a minimal path cover of G , with each path corresponding to a single assembled transcript 1 Alignments divided into non-overlapping loci 2 Erroneous read alignments removed 3 Compute transitive reduction of G , G 4 5 Construct bipartite graph G ∗ from transitive closure of G ,with edges weighted by coverage to “phase” distant exons by their coverage Compute minimum-cost maximal matching in G ∗ , which corresponds to minimum path cover of G Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 20 / 24
  21. 21. Discussion Three different construction approaches Butterfly: enumerate all plausible transcripts with minimal read support PASA: for each alignment, find largest assembly (transcript) containing the alignment CuffLinks: find minimal assembl(y|ies) that explain the data, using read coverage to “phase” distant exons Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 21 / 24
  22. 22. Discussion Next time: comparison of 8 Trinity assemblies Four assembly settings Butterfly --PasaFly --CuffFly Butterfly, --min kmer cov 2 Two input data sets Groomed data Groomed data with digital normalization Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 22 / 24
  23. 23. Discussion Next time: comparison of 8 Trinity assemblies Hypotheses (transcripts per assembly) Butterfly > PasaFly > CuffFly Diginorm > No diginorm Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 23 / 24
  24. 24. Discussion Thank you! Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 24 / 24
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×