Computational analysis of the modular architecture of secondary metabolite biosynthesis gene clusters
by Konrad Zych
- 793 views
The biosynthetic pathways of the great majority of secondary metabolites with pharmaceutical activities are encoded by huge clusters of genes, termed secondary metabolite gene clusters (SMGCs). Inside ...
The biosynthetic pathways of the great majority of secondary metabolites with pharmaceutical activities are encoded by huge clusters of genes, termed secondary metabolite gene clusters (SMGCs). Inside SMGCs, genes are further grouped into conserved multigene modules, each of which is responsible for the biosynthesis of a part of the end product. To gain deeper insight into the evolution of SMGCs and to open up new ways to discover novel compounds, we analysed the modularity of SMGCs computationally in a high-throughput fashion.
As a starting point of our analysis, we created a list of SMGCs from all available actinomycete nucleotide data using our previously published software pipeline antiSMASH. To identify all highly conserved modules, we then studied co-conservation of the genes within these gene clusters. Based on a classification of all SMGCs genes into clusters of orthologous groups (COGs), we reconstructed interaction networks linking COGs by gene synteny and by co-localization of genes within the same gene clusters. A simple algorithm that overlaid the two networks could then identify highly connected motifs of COGs. These motifs represent conserved modules that can be directly linked to the chemical moieties of the secondary metabolite end product. Using these data, we were able to identify a number of gene clusters with conserved architectures (module compositions) that had not been reported earlier, which may be responsible for the biosynthesis of compounds with novel chemical structures.
Intriguingly, our catalogue of conserved modules enables a deep analysis of the evolution of SMGCs, by comparing module compositions of homologous gene clusters from different species and using parsimony or likelihood methods to infer the most probable evolutionary route that has resulted in the observed variety of architectures. It also enables screening for novel types of gene clusters on an unprecedented scale. Finally, this new approach can be integrated into the antiSMASH pipeline to allow more detailed predictions of the secondary metabolite end products of unknown SMGCs from their module compositions.
© All Rights Reserved
- Embed Views
- Views on SlideShare
- Total Views