Combining transcriptome assemblies from multiple de novo assemblers to generate full length RNA silencing gene transcripts in Nicotiana benthamiana
In an effort to produce an assembly that contained (but not limited to) full length RNA silencing gene transcripts to facilitate more informative first pass searches, and to increase the chances of finding paralogous transcripts while limiting redundancy, we have combined the sequences from multiple assemblies generated by four popular de novo transcriptome assemblers: Trans-Abyss, Trinity, Soap-denovo-trans and Oases. The subject organism is Nicotiana benthamiana, an allopolyploid plant.
Two methods were implemented to reduce the redundancy of combined assemblies - a clustering based approach (TGI clustering tools), and one that selects a 'best set' of mRNA sequences rather than producing longest possible transcripts (EvidentialGene pipeline). Metrics used to assess the quality of assemblies include the average length of the 1000 longest proteins, average bit-scores from blast comparisons against reference databases, and feature response curves.
By combining the output of different assemblers by varying k-mer sizes and input read counts, we were able to detect all 35 query RNA silencing gene transcripts as full length from simple first pass blast searches. Only 24 RNA silencing transcripts could previously be detected as complete using one assembler. While the TGI clustering tool could produce longer transcripts, the average bit-scores of blast searches and feature response curves show that the Evidential Gene pipeline produced higher quality assemblies.
By using a combined assemblies approach as recommended by the EvidentialGene pipeline, one can recover more completely assembled transcripts while limiting redundancy and maximising the quality of the assembly