Learning to Love De Bruijn Graphs

Learning to love de Bruijn graphs
Ben Woodcroft,
Australian Centre for Ecogenomics (ACE)
Winter School in Bioinformatics, 2015

K-mers and assembly
• For next-generation sequencing, comparison
of each read with each other read is
impossible.
– E.g. 10 million reads -> 107 x 107 read-read
comparisons. Slowww..
• K-mers and de Bruijn graphs help make things
tractable

My favourite k-mer size
With a 100bp read, this can never happen with a k-mer size of 51

Less tips, more bubbles
As read lengths get longer, assemblers must move
from handling dead ends in the graph to handling
bubbles.

Metagenome assembly
Me: “I know, why don’t I just assemble all my
data together?”
Run assembly
Wait 4 days
Out of memory allocating 18.4 million terabytes
of RAM.

Solutions to RAM issues
• Quality trimming
• Hard trimming
• Throwing away a proportion of reads
randomly
• Sequencing something else

Lossy de Bruijn graphs
The number of k-mers observed is vanishingly small
relative to the total number of possible k-mers
The human genome: ~3Gbp = ~3×109 k-mers
Total possible 51-mers: 451 = ~1030
0.00000000000000000002%
When making a list of k-mers, counting extra ones
probably has little effect on assembly.

Bloom filters
A low memory k-mer “store”

Is my k-mer in these reads?
From a bloom filter, the answer is either “no” or
“probably”

A finishing approach to assembly
A central assumption of this method is
that the genome is “mostly” complete

Scaffolding without mate pair data

Gap filling vs. assembly
• Regular assembly ain’t easy
• Re-assembly is more straightforward because
you are trying to get to somewhere

Gap filling can correct assembly errors
• Contigs often contain errors right at the ends
of contigs
• By starting to search a bit back (e.g. 200bp)
away from the end of the contig, these errors
can be overcome

Gap-filling can account for strain
variation
github.com/wwood/finishm

Thanks!
• Slideshare.com/benjwoodcroft
• Github.com/wwood
• Ecogenomic.org

Learning to Love De Bruijn Graphs

More Related Content

Similar to Learning to Love De Bruijn Graphs

Recently uploaded

Learning to Love De Bruijn Graphs