SlideShare a Scribd company logo
1 of 16
Don’t move! You must be the ortholog we’ve beem
looking for..
surely It must be a case
of shared domains
Now you see that
similarity is hardly
enough…
What a terrible
mistake to make! ….
…Oh, blast!...
It was another
false hit!
The Making Of
Reminiscing
• Why
• Elucidation of the putative mating system for Pneumocystis.
• Available genomes
• Rationale
• A ‘Taphrina paper’-like approach was initially envisaged
• Arbitrary list of genes of interest based on published literature
• Gene location by similarity to a bait protein reference
• Tentative exon mapping by hand
• Functional assessment
• A (not so) close (, but very) well annotated reference
• Schizosaccharomyces pombe
• Assumptions
• The existence of three close genomes allowed for
• Inference of non-existence whenever absent on all
• P. carinii - rats
• P. jirovecii – humans
• P. murina – mice
• Taphrina deformans – peach tree
• Saitoella complicata - saprophite
Looking for the the Road Ahead
• Similarity based methods (TBLASTN)
• Specifics
• Protein sequence as bait
• S. pombe @ UniProt
• Genome sequence as substrate
• Scaffolds|contigs library
• Limitations
• Unspecific hits
• Highly divergent sequences
• No hits expected
• No statistical model available
• Gene relevance
• To know the relevant process(es) inner workings
• To detect putative chocke points
• To navigate a sea of heterogeneous names
Genuine?...
Navigating Troubled Waters
• Final solution
• Phylogenetic analysis
• Labour+computation intensive
• Restricted set of sequences
• Sensible route
• Domain architecture analysis
• Faster & Sound
• Based on alignments of all the accepted cases
• Modular nature of proteins
• Underlying, but transparent phylogenetics
• Stable & Reusable
• Location by prototypical synteny
• Too short
• Too divergent
Cell Designer 4.3
The Reference
• Schizosaccharomyces pombe
• Proteome as gateway to genome annotation
• Swiss-Prot grade annotation
• Set of plus 5000 genes
• Availability of functional annotation
• Published references
• Referenced papers
• Papers located by textual search
• KEGG PATHWAY
• Sparsely used
Looking for Orthologs
InterProScan
TBLASTN
Genewise
and manual
curation
Target genome
annotation
no
yes
CDS
translation
Target domains
or domain
architecture
no yes
no
yes
S. pombe query
sequence
Target
genome
Genomic regions
Match?
InterProScan
S. pombe specific
domains or domain
architecture
Phylogenetical analysis
No homolog found Homolog found
Match?
Match?
Objective in Sight?
• Method Performance
• Split across contigs
• e.g. msh2
Going Deeper
• Method Performance
• Reassessment of annotated genes
• e.g. ste23@Pjiro
Out of Trouble?
• Method Limitations
• Annotation ambiguity
• e.g. rad24 & rad25
Phylemon2::Phymlbestaictree
A Question of Entourage
• Small highly divergent genes
• Small MAT genes through synteny
Major Findings (Pneumocystis)
• Considering Schizosaccharomyces pombe
• RNA interference (RNAi) pathway
• Lacks crucial elements (e.g.)
• dcr1
• ARC Complex
• etc
• Cell Fusion & Meiosis Onset Regulation
• Missing assorted elements
• Pheromone Action
• Incomplete signal transmission component set
• Signal transduction seem to be all presente
• Cell Cycle Regulation by Environmental Factors
• Missing some of the putative crucial elements (e.g.)
• wis4, wis1
• hog1
• atf1
• rst2
Conclusions
• Pneumocystis species
• Probable primary homothalism
• Only two different, and incomplete MAT ‘cassette’ candidates
• In the same scaffold
• No conventional silencing RNA interference pathway
• Additional argument in favor
• Further study required
• Further clinical isolates sequencing seems to support
• Existing annotation
• Should be assessed depending on its quality grade
• Odd situations should be reappraised
Looking for Orthologs

More Related Content

Viewers also liked

Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsjuancarlosrise
 
Paul Christenson TP Training Received
Paul Christenson TP Training ReceivedPaul Christenson TP Training Received
Paul Christenson TP Training ReceivedChristenson Paul
 
Colònies a La Molina
Colònies a La MolinaColònies a La Molina
Colònies a La Molinapegasobloc
 
Integral Equation Formalism for Electromagnetic Scattering from Small Particles
Integral Equation Formalism for Electromagnetic Scattering from Small ParticlesIntegral Equation Formalism for Electromagnetic Scattering from Small Particles
Integral Equation Formalism for Electromagnetic Scattering from Small ParticlesHo Yin Tam
 
tecnología de la información y comunicacion
tecnología de la información y comunicaciontecnología de la información y comunicacion
tecnología de la información y comunicacionestefanysm1605
 
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)Hassaan Sial
 
Chapter 5 review jeopardy reg modern chem
Chapter 5 review jeopardy reg modern chemChapter 5 review jeopardy reg modern chem
Chapter 5 review jeopardy reg modern chemdirksr
 

Viewers also liked (8)

Bioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomicsBioinformatics, comparative genemics and proteomics
Bioinformatics, comparative genemics and proteomics
 
Paul Christenson TP Training Received
Paul Christenson TP Training ReceivedPaul Christenson TP Training Received
Paul Christenson TP Training Received
 
Colònies a La Molina
Colònies a La MolinaColònies a La Molina
Colònies a La Molina
 
Integral Equation Formalism for Electromagnetic Scattering from Small Particles
Integral Equation Formalism for Electromagnetic Scattering from Small ParticlesIntegral Equation Formalism for Electromagnetic Scattering from Small Particles
Integral Equation Formalism for Electromagnetic Scattering from Small Particles
 
tecnología de la información y comunicacion
tecnología de la información y comunicaciontecnología de la información y comunicacion
tecnología de la información y comunicacion
 
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)
[Anne irene riisoy]_sexuality,_law_and_legal_pract(book_fi.org)
 
Tania 1
Tania 1Tania 1
Tania 1
 
Chapter 5 review jeopardy reg modern chem
Chapter 5 review jeopardy reg modern chemChapter 5 review jeopardy reg modern chem
Chapter 5 review jeopardy reg modern chem
 

Similar to Blast_anotherFalseHit

Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018Forest Research
 
GENE marker typing , abo blood grouping ,karl lamdsteiner
GENE marker typing , abo blood grouping ,karl lamdsteinerGENE marker typing , abo blood grouping ,karl lamdsteiner
GENE marker typing , abo blood grouping ,karl lamdsteinerAparnaAjayan8
 
GENE gene marker blood typing , abo blood typing vntr
GENE gene marker blood typing , abo blood typing vntrGENE gene marker blood typing , abo blood typing vntr
GENE gene marker blood typing , abo blood typing vntrAparnaAjayan8
 
Predicting SH3 Protein-protein interactions
Predicting SH3 Protein-protein interactionsPredicting SH3 Protein-protein interactions
Predicting SH3 Protein-protein interactionspedrobeltrao
 
using_web_based_tools.ppt
using_web_based_tools.pptusing_web_based_tools.ppt
using_web_based_tools.pptkenter
 
using_webbased_tools.ppt
using_webbased_tools.pptusing_webbased_tools.ppt
using_webbased_tools.pptssuserb86ba7
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walkingAleena Khan
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Sequencing @ BitLab
Sequencing @ BitLabSequencing @ BitLab
Sequencing @ BitLabRaoul Bonnal
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walkinganita devi
 
CoE-WEBINAR-2_042117v3.pptx
CoE-WEBINAR-2_042117v3.pptxCoE-WEBINAR-2_042117v3.pptx
CoE-WEBINAR-2_042117v3.pptxVandana472475
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionRutger Vos
 
Genetic Markers and their importance in Forensics
Genetic Markers and their importance in ForensicsGenetic Markers and their importance in Forensics
Genetic Markers and their importance in ForensicsMrinal Vashisth
 
Importance of Genetic Markers in Forensics
Importance of Genetic Markers in ForensicsImportance of Genetic Markers in Forensics
Importance of Genetic Markers in ForensicsMayank Raiborde
 
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...David Bapst
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Joe Parker
 
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...David Bapst
 

Similar to Blast_anotherFalseHit (20)

Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018Ewan mollison wp4 april 2018
Ewan mollison wp4 april 2018
 
07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf07_Phylogeny_2022.pdf
07_Phylogeny_2022.pdf
 
GENE marker typing , abo blood grouping ,karl lamdsteiner
GENE marker typing , abo blood grouping ,karl lamdsteinerGENE marker typing , abo blood grouping ,karl lamdsteiner
GENE marker typing , abo blood grouping ,karl lamdsteiner
 
GENE gene marker blood typing , abo blood typing vntr
GENE gene marker blood typing , abo blood typing vntrGENE gene marker blood typing , abo blood typing vntr
GENE gene marker blood typing , abo blood typing vntr
 
gene prediction methods.pptx
gene prediction methods.pptxgene prediction methods.pptx
gene prediction methods.pptx
 
Predicting SH3 Protein-protein interactions
Predicting SH3 Protein-protein interactionsPredicting SH3 Protein-protein interactions
Predicting SH3 Protein-protein interactions
 
using_web_based_tools.ppt
using_web_based_tools.pptusing_web_based_tools.ppt
using_web_based_tools.ppt
 
using_webbased_tools.ppt
using_webbased_tools.pptusing_webbased_tools.ppt
using_webbased_tools.ppt
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Sequencing @ BitLab
Sequencing @ BitLabSequencing @ BitLab
Sequencing @ BitLab
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
CoE-WEBINAR-2_042117v3.pptx
CoE-WEBINAR-2_042117v3.pptxCoE-WEBINAR-2_042117v3.pptx
CoE-WEBINAR-2_042117v3.pptx
 
Species delimitation - species limits and character evolution
Species delimitation - species limits and character evolutionSpecies delimitation - species limits and character evolution
Species delimitation - species limits and character evolution
 
Genetic Markers and their importance in Forensics
Genetic Markers and their importance in ForensicsGenetic Markers and their importance in Forensics
Genetic Markers and their importance in Forensics
 
Importance of Genetic Markers in Forensics
Importance of Genetic Markers in ForensicsImportance of Genetic Markers in Forensics
Importance of Genetic Markers in Forensics
 
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...
Combined Analysis of Extant Rhynchonellida (Brachiopoda) Using Morphological ...
 
Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...Phylogenomic methods for comparative evolutionary biology - University Colleg...
Phylogenomic methods for comparative evolutionary biology - University Colleg...
 
Pearson-TCGC-2013
Pearson-TCGC-2013Pearson-TCGC-2013
Pearson-TCGC-2013
 
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
 

Blast_anotherFalseHit

  • 1. Don’t move! You must be the ortholog we’ve beem looking for.. surely It must be a case of shared domains
  • 2. Now you see that similarity is hardly enough… What a terrible mistake to make! …. …Oh, blast!... It was another false hit!
  • 4. Reminiscing • Why • Elucidation of the putative mating system for Pneumocystis. • Available genomes • Rationale • A ‘Taphrina paper’-like approach was initially envisaged • Arbitrary list of genes of interest based on published literature • Gene location by similarity to a bait protein reference • Tentative exon mapping by hand • Functional assessment • A (not so) close (, but very) well annotated reference • Schizosaccharomyces pombe • Assumptions • The existence of three close genomes allowed for • Inference of non-existence whenever absent on all • P. carinii - rats • P. jirovecii – humans • P. murina – mice • Taphrina deformans – peach tree • Saitoella complicata - saprophite
  • 5. Looking for the the Road Ahead • Similarity based methods (TBLASTN) • Specifics • Protein sequence as bait • S. pombe @ UniProt • Genome sequence as substrate • Scaffolds|contigs library • Limitations • Unspecific hits • Highly divergent sequences • No hits expected • No statistical model available • Gene relevance • To know the relevant process(es) inner workings • To detect putative chocke points • To navigate a sea of heterogeneous names
  • 7. Navigating Troubled Waters • Final solution • Phylogenetic analysis • Labour+computation intensive • Restricted set of sequences • Sensible route • Domain architecture analysis • Faster & Sound • Based on alignments of all the accepted cases • Modular nature of proteins • Underlying, but transparent phylogenetics • Stable & Reusable • Location by prototypical synteny • Too short • Too divergent
  • 8. Cell Designer 4.3 The Reference • Schizosaccharomyces pombe • Proteome as gateway to genome annotation • Swiss-Prot grade annotation • Set of plus 5000 genes • Availability of functional annotation • Published references • Referenced papers • Papers located by textual search • KEGG PATHWAY • Sparsely used
  • 9. Looking for Orthologs InterProScan TBLASTN Genewise and manual curation Target genome annotation no yes CDS translation Target domains or domain architecture no yes no yes S. pombe query sequence Target genome Genomic regions Match? InterProScan S. pombe specific domains or domain architecture Phylogenetical analysis No homolog found Homolog found Match? Match?
  • 10. Objective in Sight? • Method Performance • Split across contigs • e.g. msh2
  • 11. Going Deeper • Method Performance • Reassessment of annotated genes • e.g. ste23@Pjiro
  • 12. Out of Trouble? • Method Limitations • Annotation ambiguity • e.g. rad24 & rad25 Phylemon2::Phymlbestaictree
  • 13. A Question of Entourage • Small highly divergent genes • Small MAT genes through synteny
  • 14. Major Findings (Pneumocystis) • Considering Schizosaccharomyces pombe • RNA interference (RNAi) pathway • Lacks crucial elements (e.g.) • dcr1 • ARC Complex • etc • Cell Fusion & Meiosis Onset Regulation • Missing assorted elements • Pheromone Action • Incomplete signal transmission component set • Signal transduction seem to be all presente • Cell Cycle Regulation by Environmental Factors • Missing some of the putative crucial elements (e.g.) • wis4, wis1 • hog1 • atf1 • rst2
  • 15. Conclusions • Pneumocystis species • Probable primary homothalism • Only two different, and incomplete MAT ‘cassette’ candidates • In the same scaffold • No conventional silencing RNA interference pathway • Additional argument in favor • Further study required • Further clinical isolates sequencing seems to support • Existing annotation • Should be assessed depending on its quality grade • Odd situations should be reappraised

Editor's Notes

  1. Good afternoon. Thank you for atending to this talk about…
  2. …what can be rightfully titled as the making of the paper shown here.
  3. This study began in the wake of a previous paper that acompanied the release of the annotated genome of Taphrina deformans. The initial challenge was to apply the same approach used before to locate and characterize the MAT, and some other sex-related genes in Pneumocystis sequenced genomes. The existence of 3 sequenced genomes, and well annotated refernce would allow some bold assumptions to be made, namely the inference of absence for a given gene.
  4. The methodology to be used would involve gene pinpointing by similarity to a reference protein sequence, manual fitting for the mapping of the CDS, and confirmation through its product functional annotation. The smallest MAT genes were to be located through relative position, and synteny. A multitude of unspecific BLAST hits, a lack of clear assessment of role, and relevance for many of the genes listed as of interest led to a complete reappraisal of the approach. To avoid misunderstandings about gene symbols the Schpo nomenclature, as presente in UniProt/Swiss-Prot was adopted.
  5. The main cause for hit unspecificity was readily assigned to the fact that many of the genes studied shared related domains as can be seen here for the SPK1 protein. Several good hits across the target scaffolds are the result of a domain presente in many diferente known architectures. Ignore this common fact at your own peril, and expect to find yourself in a disturbing maze of mirrors…
  6. To overcome this problem gene annotation was to be made independent of plain sequence similarity methods. No doubt that phylogenetic analysis would be the approach. Being a very demanding process in terms of computation,and expert interaction with the results, it can only be applied to restricted sets of sequences. In this case protein domain architecture analysis is clearly the sensible route. It is based on the underlying alignment of all the accepted elements for a given domain, and these alignments are usually broader than anything the regular phylogenetics user can envisage. … In this way BLAST hits are used just as a broad location tool.
  7. The availability of a seemingly complete, and well annotated proteome opened the door to an assessment of the regulatory processes of interest. The latter allowed for a more precise appraisal of the role each gene plays, and its relevance for whole. Missing information was brought intothis analysis from published references, and the remaining gaps filled by some data collected from KEGG Pathway.
  8. The final annotation protocol amounted to this diagram. … Please note that this approach where ortholog identification is made independent of similarity search, delivers the user from the shackles of a given evalue threshold. This means that TBLASTN searches can be carried down to almost preposterous evalue levels in order to find the most faint of the similarity signals. Hit validation is carried independently so it will not be affected by the significance level of the TBLASTN hit.
  9. Such a protocol enabled us to get ortholog candidatess even when they are dispersed among different unorddered contigs, as it is the case for P.carinii. As na exemple we may consider DNA mismatch repair protein MSH2. As you can see the functional annotation for the product of the concatenated exons found is not only very similar to the annotation of the COOH end of MSH2 in Schpo, but it matches even the relevant PANTHER subfamily. The pattern of exon distribution among the genomes studied denotes a typical increase in CDS fragmentation from S. complicata towards Pneumocystis.
  10. The same protocol enabled us to review the existing annotation for the genomic regions where the hits were found. The case shown presents the A-factor processing enzyme STE23 from S.cerevisiae (no evidente ortholog in S. pombe, maybe YAN2). In the genomic sequence of P.jirovecii the TBLASTN hit pointed to a region already occupied by 2 shorter genes. As these genes presented no meaningfull fuctional annotation, and Genewise was able to match the STE23 to exons very convincingly the case for annotation review was very compeling. Moreover both length, and domain architecture were the expected for STE23.
  11. Off course danger is always around the corner, and situations emerged where even the reference architecture proved ambiguous. An exemple of this were the DNA damage checkpoint proteins RAD24, and RAD25. Their architecture is identical, and even when a phylogenetical analysis is carried out it is very diffiecult to assign the found homologs to each of the prototypes. Moreover both the branching posterior probability, and the exon structure seem to point into different directions.
  12. The larger MAT genes were easily foud by the standard protocol, but their smaller, and more divergente neighbours had to be located through relative position, and assessed by synteny. From the several putative candidates for matMi, the largest were chosen, and they appear to present a common trait: matching the signal peptide signature at the NH3-end. matPc site in Pneumocystis was found to be consistently occupied by a hsp104 ortholog, and no other candidate were found. The original annotation involved a very large gene that after review was splitted in hsp104, and end4 genes.
  13. For Pneumocystis several pathways seem to be impaired if the Spombe reference proves valid. The most notable is the RNA interference pathway. Other circuits important to sexual reproduction seem also affected, namely the control of the onset of meyosis. This findings would allow for some interesting conclusions, but some effort has to be placed in finding possible circunventing pathways that could be at play.