Introduction                             OLC        Graph theory and assembly                  deBruijn - EulerGenome Asse...
Introduction                                    OLC               Graph theory and assembly                         deBrui...
Introduction                                      OLC          Why do we need genome assembly                 Graph theory...
Introduction                                     OLC          Why do we need genome assembly                Graph theory a...
Introduction                                    OLC     Why do we need genome assembly               Graph theory and asse...
Introduction      Overlap                                   OLC        Layout              Graph theory and assembly      ...
Introduction     Overlap                                    OLC       Layout               Graph theory and assembly      ...
Introduction   Overlap                                      OLC     Layout                 Graph theory and assembly    Co...
Introduction   Overlap                                      OLC     Layout                 Graph theory and assembly    Co...
Introduction   Overlap                                       OLC     Layout                  Graph theory and assembly    ...
Introduction   Overlap                                         OLC     Layout                    Graph theory and assembly...
Introduction                                   OLC     Definition of a graph              Graph theory and assembly    Grap...
Introduction                                     OLC     Definition of a graph                Graph theory and assembly    ...
Introduction                                   OLC     Definition of a graph              Graph theory and assembly    Grap...
Introduction                                            OLC      Definition of a graph                       Graph theory a...
Introduction                                  OLC     Definition of a graph             Graph theory and assembly    Graphs...
Introduction                                           OLC     Definition of a graph                      Graph theory and ...
Introduction                                    OLC     Definition of a graph               Graph theory and assembly    Gr...
Introduction                                          OLC     Definition of a graph                     Graph theory and as...
Introduction                                        OLC     Definition of a graph                   Graph theory and assemb...
Introduction   An alternative assembly graph                                    OLC     Constructing a de Bruijn graph fro...
Introduction   An alternative assembly graph                                         OLC     Constructing a de Bruijn grap...
Introduction   An alternative assembly graph                     OLC     Constructing a de Bruijn graph from readsGraph th...
Introduction   An alternative assembly graph                     OLC     Constructing a de Bruijn graph from readsGraph th...
Introduction   An alternative assembly graph                     OLC     Constructing a de Bruijn graph from readsGraph th...
Introduction   An alternative assembly graph                                         OLC     Constructing a de Bruijn grap...
Introduction   An alternative assembly graph                                            OLC     Constructing a de Bruijn g...
Introduction   An alternative assembly graph                                   OLC     Constructing a de Bruijn graph from...
Introduction   An alternative assembly graph                                     OLC     Constructing a de Bruijn graph fr...
Upcoming SlideShare
Loading in...5
×

Overview of Genome Assembly Algorithms

7,751

Published on

Overview of Genome Assembly Algorithms with some graph theory overview, given as invited lecture to a George Washington University course.

Published in: Technology, Education
0 Comments
12 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,751
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
12
Embeds 0
No embeds

No notes for slide

Overview of Genome Assembly Algorithms

  1. 1. Introduction OLC Graph theory and assembly deBruijn - EulerGenome Assembly Algorithms and Software (or...what to do with all that sequence data ?) Konstantinos Krampis Asst. Professor, Informatics J. Craig Venter Institute George Washington University, Nov. 2nd 2011 Konstantinos Krampis Genome Assembly Algorithms and Software
  2. 2. Introduction OLC Graph theory and assembly deBruijn - EulerIntroduction Why do we need genome assembly Definitions of genome assemblyOLC Overlap Layout Consensus OLC assembly software and publicationsGraph theory and assembly Definition of a graph Graphs and genome assemblydeBruijn - Euler An alternative assembly graph Constructing a de Bruijn graph from reads Genome assembly from de Bruijn graphs deBruijn assembly software and publications Konstantinos Krampis Genome Assembly Algorithms and Software
  3. 3. Introduction OLC Why do we need genome assembly Graph theory and assembly Definitions of genome assembly deBruijn - EulerCannot read the complete genomewith the sequencer from one end tothe other !DNA isolated from a cell isamplifiedBroken into fragments (shearing)Fragments are ”read” with thesequencerUse the fragments - reads toreconstruct the genome from Credit: Masahiro Kasahara, Large-Scale Genome Sequencesequencing reads Processing, Imprerial College Press Konstantinos Krampis Genome Assembly Algorithms and Software
  4. 4. Introduction OLC Why do we need genome assembly Graph theory and assembly Definitions of genome assembly deBruijn - EulerAssembly: hierarchical processto reconstruct genome fromreadsAssemble the puzzle of thegenome from the reads:overlaps connect the piecesOversample the genome so thatreads overlapKey approach: data structurerepresenting overlaps, andalgorithms operating on that Credit: Masahiro Kasahara, Large-Scale Genome Sequencedata structure Processing, Imprerial College Press Konstantinos Krampis Genome Assembly Algorithms and Software
  5. 5. Introduction OLC Why do we need genome assembly Graph theory and assembly Definitions of genome assembly deBruijn - EulerTwo major algorithmic paradigms for genome assembly Overlap - Layout - Consensus (OLC): well established, more powerful method, but more difficult to implement OLC: first to be used successfully for complex Eucaryotic genomes (Drosophila,H.sapiens) deBruijn - Euler: newer, easier to implement, problematic in complex genomes (for current implementations) Konstantinos Krampis Genome Assembly Algorithms and Software
  6. 6. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsFind Overlaps by aligningthe sequence of the readsLayout the reads basedon which aligns to whichGet Consensus by joiningall read sequences,merging overlapsSequencer reads inrandom direction,left-to-right or Credit: Masahiro Kasahara, Large-Scale Genome Sequence Processing,right-to-left Imprerial College Press Konstantinos Krampis Genome Assembly Algorithms and Software
  7. 7. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsSequence alignment,all-against-all reads(Smith-Watermann,BLAST, other?)Computationally intensivebut easily parallelizableRepresent read overlap byconnecting with directed Credit: Kececioglu and Myers 1995, Algorithmica 13:7-51linkFirst step in creating thegenome assembly graph(more later) Konstantinos Krampis Genome Assembly Algorithms and Software
  8. 8. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCreate a consistent linear(ideally) ordering of thereadsRemove redundancy, sono two dovetails leavethe same edgeNo containment edge isfollowed by a dovetailedgeRemove cycles, one linkin, one out Konstantinos Krampis Genome Assembly Algorithms and Software
  9. 9. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsMultiple SequenceAlignment (ClustalW)algorithms ? Nophylogeny here...Vote for the most abundantnucleotide for each positionIncorporate read quality dataCreate pre-consensus fromhigh-quality reads, and alignremaining reads to it Konstantinos Krampis Genome Assembly Algorithms and Software
  10. 10. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCelera Assembler Developed at Celera Genomics for first Drosophila and human genome assemblies Continuoued development at J. Craig Venter Inst. as open source project http://wgs-assembler.SourceForge.net (Licence: GPL) Plently of wiki (developer + user) documentation, examples, user forums Other OLC implementations: Arachne, PCAP, Newbler, Phrap, TIGR Assembler Konstantinos Krampis Genome Assembly Algorithms and Software
  11. 11. Introduction Overlap OLC Layout Graph theory and assembly Consensus deBruijn - Euler OLC assembly software and publicationsCelera Assembler publications Myers et al (2000) A whole-genome assembly of Drosophila Levy et al (2007) The diploid genome sequence of an individual human Zimin et al (2009) The domestic cow, Bos taurus Dalloul et al (2010) The domestic turkey, Meleagris gallopavo Lorenzi et al (2010) New assembly of Entamoeba histolytica Lawniczak et al (2010) Divergence in Anopheles gambiae Jones et al (2011) The marine filamentous cyanobacterium Lyngbya majuscula Miller et al The Tasmanian devil, Sarcophilus harrisii Prfer et al The great ape bonobo, Pan paniscus Gordon et al The cotton bollworm moth, Helicoverpa Konstantinos Krampis Genome Assembly Algorithms and Software
  12. 12. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - Eulerand now a bit of Graph Theory... Konstantinos Krampis Genome Assembly Algorithms and Software
  13. 13. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerGraph G with set of vertices (nodes)V: {P,T,Q,S,R}set of edges (links between nodes)E: {(P,T),(P,Q),(P,S),(Q,T),(S,T),(Q,S),(S,Q),(Q,R),(R,S)}walk from P to R:(P,Q),(Q,R)walk from R to T:(R,S),(S,Q),(Q,T)or (R,S),(S,T) Credit: Introduction to Graph Theor Robert J. Wilsonwalk from R to P: not possible Konstantinos Krampis Genome Assembly Algorithms and Software
  14. 14. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerTrail: a walk of the graph whereeach edge is visited only onceExample Trail: (P,Q), (Q,R),(R,S), (S,Q), (Q,S), (S,T)Path: a walk where each verticeis visited onceExample Path: (P,Q), (Q,R),(R,S), (S,T) Konstantinos Krampis Genome Assembly Algorithms and Software
  15. 15. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerCredit: Saad Mneimneh, CUNY Konstantinos Krampis Genome Assembly Algorithms and Software
  16. 16. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerRepresent sequence overlaps asa graph with weighted edgesSCS solution: find Path (visitall edges and vertices once) thatmaximizes weight sumHamiltonian Cycle or TravelingSaleman Problem Konstantinos Krampis Genome Assembly Algorithms and Software
  17. 17. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerWhich edge to start from?NO: misses a vertex NO: misses edge with large weight Konstantinos Krampis Genome Assembly Algorithms and Software
  18. 18. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerYES!: all vertices and edge with large weight Konstantinos Krampis Genome Assembly Algorithms and Software
  19. 19. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerA more realistic version of a read / string overlap graph (C. jejuni)Credit: Eugene W. Myers Bioinformatics 21:79-85 Konstantinos Krampis Genome Assembly Algorithms and Software
  20. 20. Introduction OLC Definition of a graph Graph theory and assembly Graphs and genome assembly deBruijn - EulerComputational Complexity SCS solution by searching for a Hamiltonian Cycle on a graph is a difficult algorithmic problem (NP-hard) Using approximation or greedy algorithms can yield a 2 to 4-aprroximation solutions (twice or four times the length of the optimal-shortest string) Transformation of Overlap Graph to String Graph leads to Polynomial time solution. No Polynomial(P) : O(n), O(n2 ), O(n3 )etc. assembler implementation yet. (1) Konstantinos Krampis Genome Assembly Algorithms and Software
  21. 21. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsPevzner, Tang andWaterman, AnEulerian pathapproach to DNAfragment assembly,PNAS 98 20019748-9753. Konstantinos Krampis Genome Assembly Algorithms and Software
  22. 22. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsdeBruijn graph: a directed graph representing overlaps betweensequences of symbolsCredit: Wikipedia Konstantinos Krampis Genome Assembly Algorithms and Software
  23. 23. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications Konstantinos Krampis Genome Assembly Algorithms and Software
  24. 24. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications Konstantinos Krampis Genome Assembly Algorithms and Software
  25. 25. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from readsGraph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publications Konstantinos Krampis Genome Assembly Algorithms and Software
  26. 26. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsIn a real genome scenario...Credit: Flicek and Birney 2009, Nature Methods 6, S6 - S12 Konstantinos Krampis Genome Assembly Algorithms and Software
  27. 27. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsEuler’s algorithm Using Euler’s algorithm we can find a path that visits each edge of the de Bruijn genome assembly graph once, in order to concatenate the edge labels and ”spell out” the assembly. Polynomial time! Credit: Wikipedia Konstantinos Krampis Genome Assembly Algorithms and Software
  28. 28. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsEuler assembler (the very first), Pevzner et al 2001 PNAS98:9748-9753Velvet assembler (more user friendly),Both those assemlers store the complete graph on the computermemory 512GB-1024GB for human genomesAt JCVI we have two 1024GB (1TB) RAM servers for assemblyothers: ABYSS, YAGA, Contrail-Bio, PASHA parallel (distributedmemory) assemblers on computer clusters Konstantinos Krampis Genome Assembly Algorithms and Software
  29. 29. Introduction An alternative assembly graph OLC Constructing a de Bruijn graph from reads Graph theory and assembly Genome assembly from de Bruijn graphs deBruijn - Euler deBruijn assembly software and publicationsThank you! contact: kkrampis@jcvi.org We hire interns at the J. Craig Venter Institute: http://www.jcvi.org/cms/education/internship-program/ Some of my other projects - Cloud Computing: http://tinyurl.com/cloudbiolinux-jcvi http://www.cloudbiolinux.org Konstantinos Krampis Genome Assembly Algorithms and Software

×