Blast_anotherFalseHit

•Download as PPTX, PDF•

1 like•202 views

This document describes the process taken to analyze genomes of Pneumocystis species to elucidate their putative mating system. The approach involved using Schizosaccharomyces pombe as a reference genome to search for orthologs in the target genomes through similarity searches, domain architecture analysis, and phylogenetic analysis. Key findings included the lack of elements of the RNA interference pathway and cell fusion/meiosis regulation in Pneumocystis, providing evidence they may have a primary homothallic mating system. The document also discusses limitations and need for reassessing some annotations.

Don’t move! You must be the ortholog we’ve beem
looking for..
surely It must be a case
of shared domains

Now you see that
similarity is hardly
enough…
What a terrible
mistake to make! ….
…Oh, blast!...
It was another
false hit!

Reminiscing
• Why
• Elucidation of the putative mating system for Pneumocystis.
• Available genomes
• Rationale
• A ‘Taphrina paper’-like approach was initially envisaged
• Arbitrary list of genes of interest based on published literature
• Gene location by similarity to a bait protein reference
• Tentative exon mapping by hand
• Functional assessment
• A (not so) close (, but very) well annotated reference
• Schizosaccharomyces pombe
• Assumptions
• The existence of three close genomes allowed for
• Inference of non-existence whenever absent on all
• P. carinii - rats
• P. jirovecii – humans
• P. murina – mice
• Taphrina deformans – peach tree
• Saitoella complicata - saprophite

Looking for the the Road Ahead
• Similarity based methods (TBLASTN)
• Specifics
• Protein sequence as bait
• S. pombe @ UniProt
• Genome sequence as substrate
• Scaffolds|contigs library
• Limitations
• Unspecific hits
• Highly divergent sequences
• No hits expected
• No statistical model available
• Gene relevance
• To know the relevant process(es) inner workings
• To detect putative chocke points
• To navigate a sea of heterogeneous names

Navigating Troubled Waters
• Final solution
• Phylogenetic analysis
• Labour+computation intensive
• Restricted set of sequences
• Sensible route
• Domain architecture analysis
• Faster & Sound
• Based on alignments of all the accepted cases
• Modular nature of proteins
• Underlying, but transparent phylogenetics
• Stable & Reusable
• Location by prototypical synteny
• Too short
• Too divergent

Cell Designer 4.3
The Reference
• Schizosaccharomyces pombe
• Proteome as gateway to genome annotation
• Swiss-Prot grade annotation
• Set of plus 5000 genes
• Availability of functional annotation
• Published references
• Referenced papers
• Papers located by textual search
• KEGG PATHWAY
• Sparsely used

Looking for Orthologs
InterProScan
TBLASTN
Genewise
and manual
curation
Target genome
annotation
no
yes
CDS
translation
Target domains
or domain
architecture
no yes
no
yes
S. pombe query
sequence
Target
genome
Genomic regions
Match?
InterProScan
S. pombe specific
domains or domain
architecture
Phylogenetical analysis
No homolog found Homolog found
Match?
Match?

Objective in Sight?
• Method Performance
• Split across contigs
• e.g. msh2

Going Deeper
• Method Performance
• Reassessment of annotated genes
• e.g. ste23@Pjiro

Out of Trouble?
• Method Limitations
• Annotation ambiguity
• e.g. rad24 & rad25
Phylemon2::Phymlbestaictree

A Question of Entourage
• Small highly divergent genes
• Small MAT genes through synteny

Major Findings (Pneumocystis)
• Considering Schizosaccharomyces pombe
• RNA interference (RNAi) pathway
• Lacks crucial elements (e.g.)
• dcr1
• ARC Complex
• etc
• Cell Fusion & Meiosis Onset Regulation
• Missing assorted elements
• Pheromone Action
• Incomplete signal transmission component set
• Signal transduction seem to be all presente
• Cell Cycle Regulation by Environmental Factors
• Missing some of the putative crucial elements (e.g.)
• wis4, wis1
• hog1
• atf1
• rst2

Conclusions
• Pneumocystis species
• Probable primary homothalism
• Only two different, and incomplete MAT ‘cassette’ candidates
• In the same scaffold
• No conventional silencing RNA interference pathway
• Additional argument in favor
• Further study required
• Further clinical isolates sequencing seems to support
• Existing annotation
• Should be assessed depending on its quality grade
• Odd situations should be reappraised

Viewers also liked

Bioinformatics, comparative genemics and proteomicsjuancarlosrise

Paul Christenson TP Training ReceivedChristenson Paul

Colònies a La Molinapegasobloc

Integral Equation Formalism for Electromagnetic Scattering from Small ParticlesHo Yin Tam

tecnología de la información y comunicacionestefanysm1605

[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)Hassaan Sial

Tania 1taniarios03

Chapter 5 review jeopardy reg modern chemdirksr

Viewers also liked (8)

Bioinformatics, comparative genemics and proteomics

Paul Christenson TP Training Received

Colònies a La Molina

Integral Equation Formalism for Electromagnetic Scattering from Small Particles

tecnología de la información y comunicacion

[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)

Tania 1

Chapter 5 review jeopardy reg modern chem

Similar to Blast_anotherFalseHit

Ewan mollison wp4 april 2018Forest Research

07_Phylogeny_2022.pdfKristen DeAngelis

GENE marker typing , abo blood grouping ,karl lamdsteinerAparnaAjayan8

GENE gene marker blood typing , abo blood typing vntrAparnaAjayan8

gene prediction methods.pptxDrRKSelvakesavanPSGR

Predicting SH3 Protein-protein interactionspedrobeltrao

using_web_based_tools.pptkenter

using_webbased_tools.pptssuserb86ba7

Chromosome walkingAleena Khan

Catalyzing Plant Science Research with RNA-seqManjappa Ganiger

Sequencing @ BitLabRaoul Bonnal

Chromosome walkinganita devi

CoE-WEBINAR-2_042117v3.pptxVandana472475

Species delimitation - species limits and character evolutionRutger Vos

Genetic Markers and their importance in ForensicsMrinal Vashisth

Importance of Genetic Markers in ForensicsMayank Raiborde

Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...David Bapst

Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker

Pearson-TCGC-2013Nathaniel Pearson

Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...David Bapst

Similar to Blast_anotherFalseHit (20)

Ewan mollison wp4 april 2018

07_Phylogeny_2022.pdf

GENE marker typing , abo blood grouping ,karl lamdsteiner

GENE gene marker blood typing , abo blood typing vntr

gene prediction methods.pptx

Predicting SH3 Protein-protein interactions

using_web_based_tools.ppt

using_webbased_tools.ppt

Chromosome walking

Catalyzing Plant Science Research with RNA-seq

Sequencing @ BitLab

Chromosome walking

CoE-WEBINAR-2_042117v3.pptx

Species delimitation - species limits and character evolution

Genetic Markers and their importance in Forensics

Importance of Genetic Markers in Forensics

Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...

Phylogenomic methods for comparative evolutionary biology - University Colleg...

Pearson-TCGC-2013

Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...

Blast_anotherFalseHit

1. Don’t move! You must be the ortholog we’ve beem looking for.. surely It must be a case of shared domains

2. Now you see that similarity is hardly enough… What a terrible mistake to make! …. …Oh, blast!... It was another false hit!

3. The Making Of

4. Reminiscing • Why • Elucidation of the putative mating system for Pneumocystis. • Available genomes • Rationale • A ‘Taphrina paper’-like approach was initially envisaged • Arbitrary list of genes of interest based on published literature • Gene location by similarity to a bait protein reference • Tentative exon mapping by hand • Functional assessment • A (not so) close (, but very) well annotated reference • Schizosaccharomyces pombe • Assumptions • The existence of three close genomes allowed for • Inference of non-existence whenever absent on all • P. carinii - rats • P. jirovecii – humans • P. murina – mice • Taphrina deformans – peach tree • Saitoella complicata - saprophite

5. Looking for the the Road Ahead • Similarity based methods (TBLASTN) • Specifics • Protein sequence as bait • S. pombe @ UniProt • Genome sequence as substrate • Scaffolds|contigs library • Limitations • Unspecific hits • Highly divergent sequences • No hits expected • No statistical model available • Gene relevance • To know the relevant process(es) inner workings • To detect putative chocke points • To navigate a sea of heterogeneous names

6. Genuine?...

7. Navigating Troubled Waters • Final solution • Phylogenetic analysis • Labour+computation intensive • Restricted set of sequences • Sensible route • Domain architecture analysis • Faster & Sound • Based on alignments of all the accepted cases • Modular nature of proteins • Underlying, but transparent phylogenetics • Stable & Reusable • Location by prototypical synteny • Too short • Too divergent

8. Cell Designer 4.3 The Reference • Schizosaccharomyces pombe • Proteome as gateway to genome annotation • Swiss-Prot grade annotation • Set of plus 5000 genes • Availability of functional annotation • Published references • Referenced papers • Papers located by textual search • KEGG PATHWAY • Sparsely used

9. Looking for Orthologs InterProScan TBLASTN Genewise and manual curation Target genome annotation no yes CDS translation Target domains or domain architecture no yes no yes S. pombe query sequence Target genome Genomic regions Match? InterProScan S. pombe specific domains or domain architecture Phylogenetical analysis No homolog found Homolog found Match? Match?

10. Objective in Sight? • Method Performance • Split across contigs • e.g. msh2

11. Going Deeper • Method Performance • Reassessment of annotated genes • e.g. ste23@Pjiro

12. Out of Trouble? • Method Limitations • Annotation ambiguity • e.g. rad24 & rad25 Phylemon2::Phymlbestaictree

13. A Question of Entourage • Small highly divergent genes • Small MAT genes through synteny

14. Major Findings (Pneumocystis) • Considering Schizosaccharomyces pombe • RNA interference (RNAi) pathway • Lacks crucial elements (e.g.) • dcr1 • ARC Complex • etc • Cell Fusion & Meiosis Onset Regulation • Missing assorted elements • Pheromone Action • Incomplete signal transmission component set • Signal transduction seem to be all presente • Cell Cycle Regulation by Environmental Factors • Missing some of the putative crucial elements (e.g.) • wis4, wis1 • hog1 • atf1 • rst2

15. Conclusions • Pneumocystis species • Probable primary homothalism • Only two different, and incomplete MAT ‘cassette’ candidates • In the same scaffold • No conventional silencing RNA interference pathway • Additional argument in favor • Further study required • Further clinical isolates sequencing seems to support • Existing annotation • Should be assessed depending on its quality grade • Odd situations should be reappraised

16. Looking for Orthologs

Editor's Notes

Good afternoon. Thank you for atending to this talk about…
…what can be rightfully titled as the making of the paper shown here.
This study began in the wake of a previous paper that acompanied the release of the annotated genome of Taphrina deformans. The initial challenge was to apply the same approach used before to locate and characterize the MAT, and some other sex-related genes in Pneumocystis sequenced genomes. The existence of 3 sequenced genomes, and well annotated refernce would allow some bold assumptions to be made, namely the inference of absence for a given gene.
The methodology to be used would involve gene pinpointing by similarity to a reference protein sequence, manual fitting for the mapping of the CDS, and confirmation through its product functional annotation. The smallest MAT genes were to be located through relative position, and synteny. A multitude of unspecific BLAST hits, a lack of clear assessment of role, and relevance for many of the genes listed as of interest led to a complete reappraisal of the approach. To avoid misunderstandings about gene symbols the Schpo nomenclature, as presente in UniProt/Swiss-Prot was adopted.
The main cause for hit unspecificity was readily assigned to the fact that many of the genes studied shared related domains as can be seen here for the SPK1 protein. Several good hits across the target scaffolds are the result of a domain presente in many diferente known architectures. Ignore this common fact at your own peril, and expect to find yourself in a disturbing maze of mirrors…
To overcome this problem gene annotation was to be made independent of plain sequence similarity methods. No doubt that phylogenetic analysis would be the approach. Being a very demanding process in terms of computation,and expert interaction with the results, it can only be applied to restricted sets of sequences. In this case protein domain architecture analysis is clearly the sensible route. It is based on the underlying alignment of all the accepted elements for a given domain, and these alignments are usually broader than anything the regular phylogenetics user can envisage. … In this way BLAST hits are used just as a broad location tool.
The availability of a seemingly complete, and well annotated proteome opened the door to an assessment of the regulatory processes of interest. The latter allowed for a more precise appraisal of the role each gene plays, and its relevance for whole. Missing information was brought intothis analysis from published references, and the remaining gaps filled by some data collected from KEGG Pathway.
The final annotation protocol amounted to this diagram. … Please note that this approach where ortholog identification is made independent of similarity search, delivers the user from the shackles of a given evalue threshold. This means that TBLASTN searches can be carried down to almost preposterous evalue levels in order to find the most faint of the similarity signals. Hit validation is carried independently so it will not be affected by the significance level of the TBLASTN hit.
Such a protocol enabled us to get ortholog candidatess even when they are dispersed among different unorddered contigs, as it is the case for P.carinii. As na exemple we may consider DNA mismatch repair protein MSH2. As you can see the functional annotation for the product of the concatenated exons found is not only very similar to the annotation of the COOH end of MSH2 in Schpo, but it matches even the relevant PANTHER subfamily. The pattern of exon distribution among the genomes studied denotes a typical increase in CDS fragmentation from S. complicata towards Pneumocystis.
The same protocol enabled us to review the existing annotation for the genomic regions where the hits were found. The case shown presents the A-factor processing enzyme STE23 from S.cerevisiae (no evidente ortholog in S. pombe, maybe YAN2). In the genomic sequence of P.jirovecii the TBLASTN hit pointed to a region already occupied by 2 shorter genes. As these genes presented no meaningfull fuctional annotation, and Genewise was able to match the STE23 to exons very convincingly the case for annotation review was very compeling. Moreover both length, and domain architecture were the expected for STE23.
Off course danger is always around the corner, and situations emerged where even the reference architecture proved ambiguous. An exemple of this were the DNA damage checkpoint proteins RAD24, and RAD25. Their architecture is identical, and even when a phylogenetical analysis is carried out it is very diffiecult to assign the found homologs to each of the prototypes. Moreover both the branching posterior probability, and the exon structure seem to point into different directions.
The larger MAT genes were easily foud by the standard protocol, but their smaller, and more divergente neighbours had to be located through relative position, and assessed by synteny. From the several putative candidates for matMi, the largest were chosen, and they appear to present a common trait: matching the signal peptide signature at the NH3-end. matPc site in Pneumocystis was found to be consistently occupied by a hsp104 ortholog, and no other candidate were found. The original annotation involved a very large gene that after review was splitted in hsp104, and end4 genes.
For Pneumocystis several pathways seem to be impaired if the Spombe reference proves valid. The most notable is the RNA interference pathway. Other circuits important to sexual reproduction seem also affected, namely the control of the onset of meyosis. This findings would allow for some interesting conclusions, but some effort has to be placed in finding possible circunventing pathways that could be at play.

Blast_anotherFalseHit

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Blast_anotherFalseHit

Similar to Blast_anotherFalseHit (20)

Blast_anotherFalseHit

Editor's Notes