Race against the sequencing machine: processing of raw DNA sequence data at the Genomics Core
by dr. Luc Dehaspe - Genomics Core, UZ Leuven
To grow and function, living organisms unconsciously and continuously read instructions from the DNA sequence in each cell. Thanks to the advances in DNA sequencing technology, scientists are increasingly able to consciously read along. In 2001, sequencing efforts resulted in a first draft of human genome. Since then, the capacity of the DNA reading machines has doubled every six months on average. While the first human genome sequencing project took years of worldwide collaboration, multiple genomes can now be sequenced in 10 days on a single machine at a service facility such as the Genomics Core.
Each sequencing run gives rise to a few terabytes of raw data that, using bioinformatics techniques, must be processed in time, before the next bunch of data arrives.
I will discuss bioinformatics techniques that are commonly used in the Genomics Core and that have a chance to survive another generation of sequencing machines. <\br>A crucial feature of these techniques is that they keep up with the sequencing machines by creating sub-tasks that are distributed over an extensible network of computers.