SlideShare a Scribd company logo
1 of 24
Download to read offline
Transcript reconstruction algorithms available in the
Trinity RNA-Seq package
Daniel Standage
Brendel Group, Indiana University

4 Mar 2014

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

1 / 24
Introduction

RNA-Seq

RNA-Seq

Examination of transcriptomes
deep
effective
affordable

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

2 / 24
Introduction

RNA-Seq

RNA-Seq

High throughput comes at the expense of
contiguity.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

3 / 24
Introduction

RNA-Seq

RNA-Seq

High throughput comes at the expense of
contiguity...well, at least for now.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

4 / 24
Introduction

Assembly with Trinity

Transcriptome assembly

In the absence of full-length transcript sequences,
reconstruct full-length sequences from fragments.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

5 / 24
Introduction

Assembly with Trinity

Trinity RNA-Seq

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

6 / 24
Introduction

Assembly with Trinity

Trinity RNA-Seq

Now with 3 transcript reconstruction modes!
Butterfly (default)
--PasaFly
--CuffFly

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

7 / 24
Introduction

Assembly with Trinity

Review outline

Trinity algorithm
PASA algorithm
Cufflinks algorithm
Discussion

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

8 / 24
Trinity

Inchworm

Step 1: Inchworm

Assemble unique contigs representing transcript
subsequences.
Often produces dominant isoform in full length, and then just unique
portions of alternative isoforms.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

9 / 24
Trinity

Inchworm

Inchworm procedure

1

Create dictionary of k-mers (k = 25)

2

Remove k-mers containing probable errors (based on coverage?)

3

Selects highest occurring k-mer

4

Build contig by extending k-mer (find highest occurring k-mer with
k − 1 bp overlap, extend 1 bp), remove k-mer from dictionary

5

Repeat previous step until the contig cannot be extended further,
report contig

6

Repeat steps 3-5 until all k-mers are exhausted

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

10 / 24
Trinity

Chrysalis

Step 2: Chrysalis

Group Inchworm contigs, construct de Bruijn
graph for each cluster.
Each connected component of the graph corresponds to one or more genes
with shared sequence.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

11 / 24
Trinity

Chrysalis

Chrysalis procedure

1

Group contigs if they share perfect overlap of k − 1 bp (with reads
supporting the overlap)

2

Build de Bruijn graph with k − 1 word size for nodes, k for edges;
edges weighted by supporting reads

3

Assign each read to component with which it shares the largest
number of k-mers

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

12 / 24
Trinity

Butterfly

Step 3: Butterfly

Traverse read-supported paths in each subgraph,
enumerate plausible sequences.

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

13 / 24
Trinity

Butterfly

Butterfly procedure

1

2

Graph simplification: merge consecutive nodes in linear paths,
pruning minor deviations
Plausible path scoring: identify paths in graph with read support
Initialize DP table with source nodes (no incoming edges)
Fill in table by extending path prefixes by one node

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

14 / 24
PASA

PASA

Program to Assemble Spliced Alignments
designed for ESTs and FL-cDNAs (pre-NGS era)
works on sequence alignments
computes consensus spliced alignments

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

15 / 24
PASA

PASA algorithm

Input: a set of spliced cDNA alignments A
Output: for each alignment a ∈ A, the largest assembly containing a
1

Sort alignments

2

Test overlapping alignments for compatibility

3

Build DP table, backtrace to find maximal assembly A∗

4

If ∃a ∈ A∗ , build reciprocal DP table, trace to enumerate additional
/
assemblies

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

16 / 24
PASA

PASA algorithm

Recurrences
La = max{Ca , Lb + Ca/b }
b

Ra = max{Ca , Rb + Ca/b }
b

La , Ra : maximum number of cDNAs in an assembly that contains
alignment a, starting from left and right (respectively)
Ca : number of a-compatible alignments in the span of a
Ca/b : number of a-compatible alignments in the span of a but not in
the span of b

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

17 / 24
PASA

PASA algorithm

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

18 / 24
Cufflinks

Cufflinks

designed for short transcript reads (NGS era)
works on read alignments (mappings)
identifies fewest number of transcripts that “explain” the read
mappings

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

19 / 24
Cufflinks

Cufflinks algorithm
Input: overlap graph G of mapped reads
Output: a minimal path cover of G , with each path corresponding
to a single assembled transcript
1

Alignments divided into non-overlapping loci

2

Erroneous read alignments removed

3

Compute transitive reduction of G , G

4

5

Construct bipartite graph G ∗ from transitive closure of G ,with edges
weighted by coverage to “phase” distant exons by their coverage
Compute minimum-cost maximal matching in G ∗ , which corresponds
to minimum path cover of G

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

20 / 24
Discussion

Three different construction approaches

Butterfly: enumerate all plausible transcripts with minimal read
support
PASA: for each alignment, find largest assembly (transcript)
containing the alignment
CuffLinks: find minimal assembl(y|ies) that explain the data,
using read coverage to “phase” distant exons

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

21 / 24
Discussion

Next time: comparison of 8 Trinity assemblies

Four assembly settings
Butterfly
--PasaFly
--CuffFly
Butterfly, --min kmer cov 2

Two input data sets
Groomed data
Groomed data with digital normalization

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

22 / 24
Discussion

Next time: comparison of 8 Trinity assemblies

Hypotheses

(transcripts per assembly)

Butterfly > PasaFly > CuffFly
Diginorm > No diginorm

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

23 / 24
Discussion

Thank you!

Daniel Standage (Brendel Group @ IU)

Trinity Assembly

4 Mar 2014

24 / 24

More Related Content

Viewers also liked

On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...
On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...
On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...Keisha_Goode
 
Giorth 28 oktober
Giorth 28 oktoberGiorth 28 oktober
Giorth 28 oktoberpopimerg
 
Creations Lingerie 2010 by Juliette Dekeyser
Creations Lingerie 2010 by Juliette DekeyserCreations Lingerie 2010 by Juliette Dekeyser
Creations Lingerie 2010 by Juliette DekeyserJuliette Dekeyser
 
NeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingNeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingKrzysztof Gorgolewski
 
Evaluation Question 5
Evaluation Question 5Evaluation Question 5
Evaluation Question 5Henry Tait
 
Food To Grow Taller
Food To Grow TallerFood To Grow Taller
Food To Grow Tallersliffebr
 
Balance Management myTeam
Balance Management myTeamBalance Management myTeam
Balance Management myTeamEngage Hill
 
How to prepare Empanadas 4 CAD
How to prepare Empanadas 4 CADHow to prepare Empanadas 4 CAD
How to prepare Empanadas 4 CAD4cadenglish
 
Article Revue Générale des Chemins de Fer décembre2015
Article Revue Générale des Chemins de Fer décembre2015Article Revue Générale des Chemins de Fer décembre2015
Article Revue Générale des Chemins de Fer décembre2015Logicités
 
Celebration service 4.1.14
Celebration service 4.1.14Celebration service 4.1.14
Celebration service 4.1.14KeepSinging
 

Viewers also liked (12)

On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...
On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...
On Diversity: Contemporary Black Midwives Perceptions of Organizational Diver...
 
Giorth 28 oktober
Giorth 28 oktoberGiorth 28 oktober
Giorth 28 oktober
 
Yourprezi
YourpreziYourprezi
Yourprezi
 
Creations Lingerie 2010 by Juliette Dekeyser
Creations Lingerie 2010 by Juliette DekeyserCreations Lingerie 2010 by Juliette Dekeyser
Creations Lingerie 2010 by Juliette Dekeyser
 
NeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimagingNeuroVault and the vision for data sharing in neuroimaging
NeuroVault and the vision for data sharing in neuroimaging
 
Kewirausahaan
KewirausahaanKewirausahaan
Kewirausahaan
 
Evaluation Question 5
Evaluation Question 5Evaluation Question 5
Evaluation Question 5
 
Food To Grow Taller
Food To Grow TallerFood To Grow Taller
Food To Grow Taller
 
Balance Management myTeam
Balance Management myTeamBalance Management myTeam
Balance Management myTeam
 
How to prepare Empanadas 4 CAD
How to prepare Empanadas 4 CADHow to prepare Empanadas 4 CAD
How to prepare Empanadas 4 CAD
 
Article Revue Générale des Chemins de Fer décembre2015
Article Revue Générale des Chemins de Fer décembre2015Article Revue Générale des Chemins de Fer décembre2015
Article Revue Générale des Chemins de Fer décembre2015
 
Celebration service 4.1.14
Celebration service 4.1.14Celebration service 4.1.14
Celebration service 4.1.14
 

More from danielstandage

Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013danielstandage
 
Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013danielstandage
 
Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013danielstandage
 
Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013danielstandage
 
Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013danielstandage
 
Brendel Group Presentation: 17 Oct 2013
Brendel Group Presentation: 17 Oct 2013Brendel Group Presentation: 17 Oct 2013
Brendel Group Presentation: 17 Oct 2013danielstandage
 

More from danielstandage (6)

Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013Brendel Group Presentation: 6 Mar 2013
Brendel Group Presentation: 6 Mar 2013
 
Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013Brendel Group Presentation: 21 Nov 2013
Brendel Group Presentation: 21 Nov 2013
 
Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013Brendel Group Presentation: 19 Nov 2013
Brendel Group Presentation: 19 Nov 2013
 
Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013Brendel Group Presentation: 5 Nov 2013
Brendel Group Presentation: 5 Nov 2013
 
Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013Brendel Group Presentation: 15 Oct 2013
Brendel Group Presentation: 15 Oct 2013
 
Brendel Group Presentation: 17 Oct 2013
Brendel Group Presentation: 17 Oct 2013Brendel Group Presentation: 17 Oct 2013
Brendel Group Presentation: 17 Oct 2013
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Brendel Group Presentation: 4 Mar 2013

  • 1. Transcript reconstruction algorithms available in the Trinity RNA-Seq package Daniel Standage Brendel Group, Indiana University 4 Mar 2014 Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 1 / 24
  • 2. Introduction RNA-Seq RNA-Seq Examination of transcriptomes deep effective affordable Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 2 / 24
  • 3. Introduction RNA-Seq RNA-Seq High throughput comes at the expense of contiguity. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 3 / 24
  • 4. Introduction RNA-Seq RNA-Seq High throughput comes at the expense of contiguity...well, at least for now. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 4 / 24
  • 5. Introduction Assembly with Trinity Transcriptome assembly In the absence of full-length transcript sequences, reconstruct full-length sequences from fragments. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 5 / 24
  • 6. Introduction Assembly with Trinity Trinity RNA-Seq Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 6 / 24
  • 7. Introduction Assembly with Trinity Trinity RNA-Seq Now with 3 transcript reconstruction modes! Butterfly (default) --PasaFly --CuffFly Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 7 / 24
  • 8. Introduction Assembly with Trinity Review outline Trinity algorithm PASA algorithm Cufflinks algorithm Discussion Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 8 / 24
  • 9. Trinity Inchworm Step 1: Inchworm Assemble unique contigs representing transcript subsequences. Often produces dominant isoform in full length, and then just unique portions of alternative isoforms. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 9 / 24
  • 10. Trinity Inchworm Inchworm procedure 1 Create dictionary of k-mers (k = 25) 2 Remove k-mers containing probable errors (based on coverage?) 3 Selects highest occurring k-mer 4 Build contig by extending k-mer (find highest occurring k-mer with k − 1 bp overlap, extend 1 bp), remove k-mer from dictionary 5 Repeat previous step until the contig cannot be extended further, report contig 6 Repeat steps 3-5 until all k-mers are exhausted Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 10 / 24
  • 11. Trinity Chrysalis Step 2: Chrysalis Group Inchworm contigs, construct de Bruijn graph for each cluster. Each connected component of the graph corresponds to one or more genes with shared sequence. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 11 / 24
  • 12. Trinity Chrysalis Chrysalis procedure 1 Group contigs if they share perfect overlap of k − 1 bp (with reads supporting the overlap) 2 Build de Bruijn graph with k − 1 word size for nodes, k for edges; edges weighted by supporting reads 3 Assign each read to component with which it shares the largest number of k-mers Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 12 / 24
  • 13. Trinity Butterfly Step 3: Butterfly Traverse read-supported paths in each subgraph, enumerate plausible sequences. Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 13 / 24
  • 14. Trinity Butterfly Butterfly procedure 1 2 Graph simplification: merge consecutive nodes in linear paths, pruning minor deviations Plausible path scoring: identify paths in graph with read support Initialize DP table with source nodes (no incoming edges) Fill in table by extending path prefixes by one node Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 14 / 24
  • 15. PASA PASA Program to Assemble Spliced Alignments designed for ESTs and FL-cDNAs (pre-NGS era) works on sequence alignments computes consensus spliced alignments Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 15 / 24
  • 16. PASA PASA algorithm Input: a set of spliced cDNA alignments A Output: for each alignment a ∈ A, the largest assembly containing a 1 Sort alignments 2 Test overlapping alignments for compatibility 3 Build DP table, backtrace to find maximal assembly A∗ 4 If ∃a ∈ A∗ , build reciprocal DP table, trace to enumerate additional / assemblies Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 16 / 24
  • 17. PASA PASA algorithm Recurrences La = max{Ca , Lb + Ca/b } b Ra = max{Ca , Rb + Ca/b } b La , Ra : maximum number of cDNAs in an assembly that contains alignment a, starting from left and right (respectively) Ca : number of a-compatible alignments in the span of a Ca/b : number of a-compatible alignments in the span of a but not in the span of b Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 17 / 24
  • 18. PASA PASA algorithm Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 18 / 24
  • 19. Cufflinks Cufflinks designed for short transcript reads (NGS era) works on read alignments (mappings) identifies fewest number of transcripts that “explain” the read mappings Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 19 / 24
  • 20. Cufflinks Cufflinks algorithm Input: overlap graph G of mapped reads Output: a minimal path cover of G , with each path corresponding to a single assembled transcript 1 Alignments divided into non-overlapping loci 2 Erroneous read alignments removed 3 Compute transitive reduction of G , G 4 5 Construct bipartite graph G ∗ from transitive closure of G ,with edges weighted by coverage to “phase” distant exons by their coverage Compute minimum-cost maximal matching in G ∗ , which corresponds to minimum path cover of G Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 20 / 24
  • 21. Discussion Three different construction approaches Butterfly: enumerate all plausible transcripts with minimal read support PASA: for each alignment, find largest assembly (transcript) containing the alignment CuffLinks: find minimal assembl(y|ies) that explain the data, using read coverage to “phase” distant exons Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 21 / 24
  • 22. Discussion Next time: comparison of 8 Trinity assemblies Four assembly settings Butterfly --PasaFly --CuffFly Butterfly, --min kmer cov 2 Two input data sets Groomed data Groomed data with digital normalization Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 22 / 24
  • 23. Discussion Next time: comparison of 8 Trinity assemblies Hypotheses (transcripts per assembly) Butterfly > PasaFly > CuffFly Diginorm > No diginorm Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 23 / 24
  • 24. Discussion Thank you! Daniel Standage (Brendel Group @ IU) Trinity Assembly 4 Mar 2014 24 / 24