Be the first to like this
Talk in gene discovery session at PAGXXII (https://pag.confex.com/pag/xxii/webprogram/Session2128.html)
Joint work with Jonas Behr, Gabriele Schweikert, Andre Kahles and others.
Abstract: High throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in detection of expressed genes and transcripts. However, the immense dynamic range of gene expression, limitations and biases of the sequencing technology, as well as the observed complexity of the transcriptional landscape pose profound computational challenges. We discuss several of these challenges and based on illustrative simulation examples, we identify the limits of state-of-the-art tools in reconstructing multiple alternative transcripts even if sufficient information is provided. We propose a novel framework, called MiTie, for simultaneous transcript reconstruction and quantification based on combinatorial optimization. We use the negative binomial distribution to define a likelihood function and use a regularization approach to select a small number of transcripts quantitatively explaining the observed read data. We show that the resulting regularized maximum likelihood problem can be formulated as a mixed integer programming problem (MIP) which can be solved optimally using standard optimization approaches. We will also describe an extension of the discriminative gene finding system mGene that takes advantage of RNA-seq reads. We demonstrate that the extended system mGene.ngs can significantly more accurately predict transcript annotations when using RNA-seq data and also better than tools for transcriptome reconstruction that are solely based on RNA-seq data. Finally, we illustrate how a combination of gene finding and transcriptome reconstruction methods like MiTie can be used to accurately annotate newly sequenced genomes without prior annotations.