BG7, a new system for bacterial genome annotation designed for NGS data

7,636
-1

Published on

Slides from the talk presented at the conference "Applied Bioinformatics and Public Health" at Cambridge during 1-3 June 2011

Published in: Health & Medicine
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,636
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
59
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

BG7, a new system for bacterial genome annotation designed for NGS data

  1. 1. BG7 A new system for bacterial genome annotation designed for NGS datawww.ohnosequences.com www.era7bioinformatics.com
  2. 2. Motivation MotivationFeatures The need of a system specially designed for NGS data annotation with a pipeline unbiased by existing annotation systemsHow it works? designed for Sanger sequences The need of a versatile system able to annotate genes even in theComparisons step of preliminary assembly of the genomeUpcoming features Special focus is given to the detection of “unexpected proteins” without orthologous in close genomes (horizontally acquired genes, phage genes, plasmid genes…) A fast, automated and scalable process to face the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologieswww.ohnosequences.com www.era7bioinformatics.com
  3. 3. Motivation FeaturesFeatures 1. A new approachHow it works? 2. It’s tolerant to NGS errorsComparisons 3. It’s based on cloud computingUpcoming features 4. It uses bio4jwww.ohnosequences.com www.era7bioinformatics.com
  4. 4. Motivation Features: ApproachFeaturesHow it works? ORF predictionComparisons is based onUpcoming features protein similaritywww.ohnosequences.com www.era7bioinformatics.com
  5. 5. Motivation Features: ApproachFeatures Use as much information as you can (not just start/stop signals)How it works? TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGAComparisonsUpcoming features TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA A B C D Ewww.ohnosequences.com www.era7bioinformatics.com
  6. 6. Motivation Features: ApproachFeatures Standard BG7How it works? Sequence SequenceComparisons Protein searchingUpcoming features ORF prediction (Blast) (Glimmer) CDS prediction Function prediction RNA searching (Blast) (Blast)www.ohnosequences.com www.era7bioinformatics.com
  7. 7. Motivation Features: NGS errorsFeatures Issue TechnologyHow it works? Genomes in several contigs AllComparisons Sequencing errors in start/stop codons Illumina substitutions 454 indelsUpcoming features Frameshifts 454 indels Horizontal gene transfer None BG7 system is tolerant to all these issueswww.ohnosequences.com www.era7bioinformatics.com
  8. 8. Motivation Features: Cloud computingFeatures AWS (Amazon Web Services)How it works?Comparisons Completely Scalable On demandUpcoming features Fast Cheap Useful in tracking outbreaks 1 genome in ~2 hours 100 genomes in ~2 hours once you’ve got the reference proteinswww.ohnosequences.com www.era7bioinformatics.com
  9. 9. Motivation Features: bio4jFeatures It usesHow it works?ComparisonsUpcoming features Much richer annotations www.bio4j.comwww.ohnosequences.com www.era7bioinformatics.com
  10. 10. Motivation How it works?FeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  11. 11. • Expert Manual Selection of reference sequences 1 • Protein search 2 • Blast • CDS definition • HSPs merge 3 • Extension of the similarity region searching for start/stop signals • Solving conflicts • Solving duplicates 4 • Solving overlaps • RNA search 5 • Blast • Incorporation of RNA genes • Definition of RNA genes 6 • Conflicts with protein coding genes previously annotated are solvedwww.ohnosequences.com www.era7bioinformatics.com
  12. 12. Motivation Step 2: Protein search with tBlastnFeatures A B CHow it works?ComparisonsUpcoming features Reference Proteins (aa) are searched in the contigs sequences Input contigs (aa)www.ohnosequences.com www.era7bioinformatics.com
  13. 13. Motivation Step 3: CDS definitionFeatures Merging HSPsHow it works? Several HSPsComparisons Input contigs (aa)Upcoming features Proteinwww.ohnosequences.com www.era7bioinformatics.com
  14. 14. Motivation Step 3: CDS definitionFeatures Merging HSPsHow it works? Several HSPsComparisons Input contigs (aa)Upcoming features Protein We merge the HSPs to form a single similarity regionwww.ohnosequences.com www.era7bioinformatics.com
  15. 15. Motivation Step 3: CDS definitionFeatures Search for start/stop signalsHow it works?ComparisonsUpcoming features We then search for start/stop signals upstream and downstream the region with high similarity with the proteinwww.ohnosequences.com www.era7bioinformatics.com
  16. 16. Motivation Step 3: CDS definitionFeatures Although we don’t find an start/stop codon for a givenHow it works? CDS we keep itComparisons We just mark it accordinglyUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  17. 17. Motivation Step 4: Solving conflictsFeatures DuplicatesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  18. 18. Motivation Step 4: Solving conflictsFeatures DuplicatesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  19. 19. Motivation Step 4: Solving conflictsFeatures Overlapping CDSHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  20. 20. Motivation Step 5: RNA searchFeatures Blastn Input contigs (nt)How it works?ComparisonsUpcoming features Reference RNAs (nt) are searched in the contigswww.ohnosequences.com www.era7bioinformatics.com
  21. 21. Motivation Step 6: Incorporation of RNA genesFeatures Definition of RNA genes Input contigs (nt)How it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  22. 22. Motivation Step 6: Incorporation of RNA genesFeatures Conflicts with protein coding genes are solvedHow it works?ComparisonsUpcoming features If in a particular region we find a protein coding gene and a RNA gene. RNA gene is selected over the protein coding onewww.ohnosequences.com www.era7bioinformatics.com
  23. 23. Motivation FinallyFeaturesHow it works?ComparisonsUpcoming features TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA A B C D Ewww.ohnosequences.com www.era7bioinformatics.com
  24. 24. Motivation ComparisonsFeatures We’ve compared the NCBI annotations forHow it works? Escherichia coli str. K-12 substr. MG1655 (Refseq ID NC_000913)ComparisonsUpcoming features With BG7 annotationswww.ohnosequences.com www.era7bioinformatics.com
  25. 25. Motivation ComparisonsFeatures The results we got were:How it works?Comparisons Feature NCBI BG7Upcoming features Protein coding genes 4145 43701 49512 RNA 175 156 1 Selected genes 2 All detected genes: Selected + dismissedwww.ohnosequences.com www.era7bioinformatics.com
  26. 26. Motivation ComparisonsFeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  27. 27. Motivation ComparisonsFeatures ConclusionsHow it works? Even in a not advantageous situationComparisons (not a NGS project and a very well annotated genome)Upcoming features We got in one round annotation step - ~95% of the NCBI protein coding genes - ~89% of the NCBI RNA genes - 419 new proteins detectedwww.ohnosequences.com www.era7bioinformatics.com
  28. 28. Motivation Upcoming featuresFeatures Improvements now focused on:How it works? - Overlapping solving phaseComparisons - Detection of very small proteinsUpcoming features And any new need we find using itwww.ohnosequences.com www.era7bioinformatics.com
  29. 29. Motivation Thanks:Features Oh no sequences! teamHow it works? Raquel Tobes: Bioinformatician, main advisorComparisons Pablo Pareja: Main developerUpcoming features Eduardo Pareja: Scientific advisor Eduardo Pareja-Tobes: Mathematician, advisor Carmen Torrecillas: Junior Bioinformatician Marina Manrique: Bioinformaticianwww.ohnosequences.com www.era7bioinformatics.com
  30. 30. Thanks for your attention!www.ohnosequences.com www.era7bioinformatics.com

×