Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Raunak Shrestha
27th October 2011
Source:
Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation...
Background
• Describing genetic
diversity of some
organism is difficult on
the basis of a single
reference genome
• Pan-ge...
Background
3
Schnoes et. al., 2009
The change in
misannotation over time
in the NR database for
the 37 families
investigat...
Mugsy-Annotator (http://mugsy.sf.net)
• Steps:
1. Aligning multiple whole genomes,
2. mapping orthologs among the genomes,...
Determining Orthologs
• Identifies orthologs on the basis of Whole Genome Alignment
(WGA), sequence position and length of...
Identification of annotation inconsistencies
• Evaluate Start codon, Stop codon and Translation Initiation
Sites (TIS),
6
Data set
• Neisseria meningitidis (Nmen) dataset of 20 genomes
• Nmen verA contained 13 genomes
• Nmen verB contained 7 ge...
Comparison of the groups of
orthologs for 20 Nmen genomes
• Within the genes reported exclusively by any one method
• intr...
Run Time Performance
• Nmen dataset of 20 genomes
• single CPU in ~4 h
• ~2 h for WGA with Mugsy and
• ~2 h for comparing ...
10
Consistencyof annotatedgenestructures in several
speciespan-genomes as reportedby Mugsy-Annotator
11
improve annotation consistency
• In case of inconsistency
in TIS, Mugsy-Annotator
suggests alternative
gene structures tha...
Conclusion
• aids in identifying and comparing gene content across a pan-
genome
• Aids annotation and re-annotation of ge...
Critique
• Musgy-Annotator requires pre-predicted annotation
information and is therefore not an independent annotation
to...
15
QUESTIONS
?
Upcoming SlideShare
Loading in …5
×

Improving pan-genome annotation using whole genome multiple alignment

675 views

Published on

Improving pan-genome annotation using whole genome multiple alignment

Published in: Health & Medicine
  • Be the first to comment

Improving pan-genome annotation using whole genome multiple alignment

  1. 1. Raunak Shrestha 27th October 2011 Source: Angiuoli SV, Hotopp JC, Salzberg SL, Tettelin H. Improving pan-genome annotation using whole genome multiple alignment. BMC Bioinformatics. 2011 Jun 30;12:272.
  2. 2. Background • Describing genetic diversity of some organism is difficult on the basis of a single reference genome • Pan-genomes • greater intra-specific genetic variation even in closely related strains • To aid gene-prediction & annotation genome sequence of the some closely related strains are required 2 http://en.wikipedia.org/wiki/File:Pan-genome-graphics.png
  3. 3. Background 3 Schnoes et. al., 2009 The change in misannotation over time in the NR database for the 37 families investigated.
  4. 4. Mugsy-Annotator (http://mugsy.sf.net) • Steps: 1. Aligning multiple whole genomes, 2. mapping orthologs among the genomes, 3. identifying annotation anomalies 4 • Objectives : 1) identifying orthologs and 2) Evaluating the quality of annotated gene structures in prokaryotic genomes.
  5. 5. Determining Orthologs • Identifies orthologs on the basis of Whole Genome Alignment (WGA), sequence position and length of sequence. • expects one segment per organism in the whole genome alignment. • For segmental duplications: • It will report separate ortholog groups for each copy only if whole genome alignment identifies orthologous copies in other genomes • If not, it will not recognize the duplication and group under a single ortholog 5
  6. 6. Identification of annotation inconsistencies • Evaluate Start codon, Stop codon and Translation Initiation Sites (TIS), 6
  7. 7. Data set • Neisseria meningitidis (Nmen) dataset of 20 genomes • Nmen verA contained 13 genomes • Nmen verB contained 7 genomes • Annotation pipeline differs between Nmen verA and Nmen verB • A genome dataset of other 9 bacterial species from Refseq database. 7
  8. 8. Comparison of the groups of orthologs for 20 Nmen genomes • Within the genes reported exclusively by any one method • intra-genome BLASTP matches predicts most of the genes to be paralogs (40 % for Mugsy-Annotator & 60% for OrthoMCL) • Some have functional names that indicate transposases • Some are hypothetical proteins • Paper claims that OrthoMCL clusters paralogs and orthologs in a single group 8
  9. 9. Run Time Performance • Nmen dataset of 20 genomes • single CPU in ~4 h • ~2 h for WGA with Mugsy and • ~2 h for comparing annotations with Mugsy-Annotator • OrthoMCL consumed ~32 CPU hours • WGA method is computationally efficient and has a significant runtime performance advantage over BLAST based OrthoMCL 9
  10. 10. 10
  11. 11. Consistencyof annotatedgenestructures in several speciespan-genomes as reportedby Mugsy-Annotator 11
  12. 12. improve annotation consistency • In case of inconsistency in TIS, Mugsy-Annotator suggests alternative gene structures that improve annotation consistency • Strategy -> to look for the conserved TIS in the close proximity to the previously annotated TIS 12
  13. 13. Conclusion • aids in identifying and comparing gene content across a pan- genome • Aids annotation and re-annotation of genes within a pan- genome rather than in a single genome • Study demonstrates significant variation in annotation primarily due to different bioinformatics approaches available rather than the true biological variation • Mugsy-Annotator : efficient, accurate method for finding orthologs within a pan-genome • Mugsy (WGA approach) is computationally efficient compared to BLAST-based approaches for finding orthologs 13
  14. 14. Critique • Musgy-Annotator requires pre-predicted annotation information and is therefore not an independent annotation tool • Musgy-Annotator still finds difficult to determine the segmental duplications and paralogs • It would have been even better, if the author had measured the performance of Musgy-Annotator for pan-genomes dataset with larger evolutionary distance. 14
  15. 15. 15 QUESTIONS ?

×