0
BG7      A new system for bacterial genome      annotation designed for NGS datawww.ohnosequences.com      www.era7bioinfo...
Motivation                    MotivationFeatures                    The need of a system specially designed for NGS data  ...
Motivation                    FeaturesFeatures                    1.   A new approachHow it works?                    2.  ...
Motivation                    Features: ApproachFeaturesHow it works?                            ORF predictionComparisons...
Motivation                       Features: ApproachFeatures                       Use as much information as you can      ...
Motivation                    Features: ApproachFeatures                         Standard                    BG7How it wor...
Motivation                        Features: NGS errorsFeatures                    Issue                                   ...
Motivation                          Features: Cloud computingFeatures                           AWS (Amazon Web Services)H...
Motivation                    Features: bio4jFeatures                    It usesHow it works?ComparisonsUpcoming features ...
Motivation                    How it works?FeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com         ...
•   Expert Manual Selection of reference sequences   1       • Protein        search   2    • Blast       • CDS definition...
Motivation                       Step 2: Protein search with tBlastnFeatures                                            A ...
Motivation                    Step 3: CDS definitionFeatures                   Merging HSPsHow it works?                  ...
Motivation                    Step 3: CDS definitionFeatures                   Merging HSPsHow it works?                  ...
Motivation                    Step 3: CDS definitionFeatures                   Search for start/stop signalsHow it works?C...
Motivation                    Step 3: CDS definitionFeatures                    Although we don’t find an start/stop codon...
Motivation                    Step 4: Solving conflictsFeatures                   DuplicatesHow it works?ComparisonsUpcomi...
Motivation                    Step 4: Solving conflictsFeatures                   DuplicatesHow it works?ComparisonsUpcomi...
Motivation                    Step 4: Solving conflictsFeatures                   Overlapping CDSHow it works?ComparisonsU...
Motivation                      Step 5: RNA searchFeatures                     Blastn                                     ...
Motivation                    Step 6: Incorporation of RNA genesFeatures                   Definition of RNA genes        ...
Motivation                    Step 6: Incorporation of RNA genesFeatures                    Conflicts with protein coding ...
Motivation                    FinallyFeaturesHow it works?ComparisonsUpcoming features                    TGGATGTGGCTCAGGA...
Motivation                    ComparisonsFeatures                    We’ve compared the NCBI annotations forHow it works? ...
Motivation                     ComparisonsFeatures                     The results we got were:How it works?Comparisons   ...
Motivation                    ComparisonsFeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com           ...
Motivation                    ComparisonsFeatures                    ConclusionsHow it works?                    Even in a...
Motivation                    Upcoming featuresFeatures                    Improvements now focused on:How it works?      ...
Motivation                    Thanks:Features                    Oh no sequences! teamHow it works?                    Raq...
Thanks for your attention!www.ohnosequences.com   www.era7bioinformatics.com
Upcoming SlideShare
Loading in...5
×

BG7, a new system for bacterial genome annotation designed for NGS data

7,312

Published on

Slides from the talk presented at the conference "Applied Bioinformatics and Public Health" at Cambridge during 1-3 June 2011

Published in: Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
7,312
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
53
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "BG7, a new system for bacterial genome annotation designed for NGS data "

  1. 1. BG7 A new system for bacterial genome annotation designed for NGS datawww.ohnosequences.com www.era7bioinformatics.com
  2. 2. Motivation MotivationFeatures The need of a system specially designed for NGS data annotation with a pipeline unbiased by existing annotation systemsHow it works? designed for Sanger sequences The need of a versatile system able to annotate genes even in theComparisons step of preliminary assembly of the genomeUpcoming features Special focus is given to the detection of “unexpected proteins” without orthologous in close genomes (horizontally acquired genes, phage genes, plasmid genes…) A fast, automated and scalable process to face the challenge of analyzing the huge amount of genomes that are being sequenced with NGS technologieswww.ohnosequences.com www.era7bioinformatics.com
  3. 3. Motivation FeaturesFeatures 1. A new approachHow it works? 2. It’s tolerant to NGS errorsComparisons 3. It’s based on cloud computingUpcoming features 4. It uses bio4jwww.ohnosequences.com www.era7bioinformatics.com
  4. 4. Motivation Features: ApproachFeaturesHow it works? ORF predictionComparisons is based onUpcoming features protein similaritywww.ohnosequences.com www.era7bioinformatics.com
  5. 5. Motivation Features: ApproachFeatures Use as much information as you can (not just start/stop signals)How it works? TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGAComparisonsUpcoming features TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA A B C D Ewww.ohnosequences.com www.era7bioinformatics.com
  6. 6. Motivation Features: ApproachFeatures Standard BG7How it works? Sequence SequenceComparisons Protein searchingUpcoming features ORF prediction (Blast) (Glimmer) CDS prediction Function prediction RNA searching (Blast) (Blast)www.ohnosequences.com www.era7bioinformatics.com
  7. 7. Motivation Features: NGS errorsFeatures Issue TechnologyHow it works? Genomes in several contigs AllComparisons Sequencing errors in start/stop codons Illumina substitutions 454 indelsUpcoming features Frameshifts 454 indels Horizontal gene transfer None BG7 system is tolerant to all these issueswww.ohnosequences.com www.era7bioinformatics.com
  8. 8. Motivation Features: Cloud computingFeatures AWS (Amazon Web Services)How it works?Comparisons Completely Scalable On demandUpcoming features Fast Cheap Useful in tracking outbreaks 1 genome in ~2 hours 100 genomes in ~2 hours once you’ve got the reference proteinswww.ohnosequences.com www.era7bioinformatics.com
  9. 9. Motivation Features: bio4jFeatures It usesHow it works?ComparisonsUpcoming features Much richer annotations www.bio4j.comwww.ohnosequences.com www.era7bioinformatics.com
  10. 10. Motivation How it works?FeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  11. 11. • Expert Manual Selection of reference sequences 1 • Protein search 2 • Blast • CDS definition • HSPs merge 3 • Extension of the similarity region searching for start/stop signals • Solving conflicts • Solving duplicates 4 • Solving overlaps • RNA search 5 • Blast • Incorporation of RNA genes • Definition of RNA genes 6 • Conflicts with protein coding genes previously annotated are solvedwww.ohnosequences.com www.era7bioinformatics.com
  12. 12. Motivation Step 2: Protein search with tBlastnFeatures A B CHow it works?ComparisonsUpcoming features Reference Proteins (aa) are searched in the contigs sequences Input contigs (aa)www.ohnosequences.com www.era7bioinformatics.com
  13. 13. Motivation Step 3: CDS definitionFeatures Merging HSPsHow it works? Several HSPsComparisons Input contigs (aa)Upcoming features Proteinwww.ohnosequences.com www.era7bioinformatics.com
  14. 14. Motivation Step 3: CDS definitionFeatures Merging HSPsHow it works? Several HSPsComparisons Input contigs (aa)Upcoming features Protein We merge the HSPs to form a single similarity regionwww.ohnosequences.com www.era7bioinformatics.com
  15. 15. Motivation Step 3: CDS definitionFeatures Search for start/stop signalsHow it works?ComparisonsUpcoming features We then search for start/stop signals upstream and downstream the region with high similarity with the proteinwww.ohnosequences.com www.era7bioinformatics.com
  16. 16. Motivation Step 3: CDS definitionFeatures Although we don’t find an start/stop codon for a givenHow it works? CDS we keep itComparisons We just mark it accordinglyUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  17. 17. Motivation Step 4: Solving conflictsFeatures DuplicatesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  18. 18. Motivation Step 4: Solving conflictsFeatures DuplicatesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  19. 19. Motivation Step 4: Solving conflictsFeatures Overlapping CDSHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  20. 20. Motivation Step 5: RNA searchFeatures Blastn Input contigs (nt)How it works?ComparisonsUpcoming features Reference RNAs (nt) are searched in the contigswww.ohnosequences.com www.era7bioinformatics.com
  21. 21. Motivation Step 6: Incorporation of RNA genesFeatures Definition of RNA genes Input contigs (nt)How it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  22. 22. Motivation Step 6: Incorporation of RNA genesFeatures Conflicts with protein coding genes are solvedHow it works?ComparisonsUpcoming features If in a particular region we find a protein coding gene and a RNA gene. RNA gene is selected over the protein coding onewww.ohnosequences.com www.era7bioinformatics.com
  23. 23. Motivation FinallyFeaturesHow it works?ComparisonsUpcoming features TGGATGTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAACGGAAAGGCTGA A B C D Ewww.ohnosequences.com www.era7bioinformatics.com
  24. 24. Motivation ComparisonsFeatures We’ve compared the NCBI annotations forHow it works? Escherichia coli str. K-12 substr. MG1655 (Refseq ID NC_000913)ComparisonsUpcoming features With BG7 annotationswww.ohnosequences.com www.era7bioinformatics.com
  25. 25. Motivation ComparisonsFeatures The results we got were:How it works?Comparisons Feature NCBI BG7Upcoming features Protein coding genes 4145 43701 49512 RNA 175 156 1 Selected genes 2 All detected genes: Selected + dismissedwww.ohnosequences.com www.era7bioinformatics.com
  26. 26. Motivation ComparisonsFeaturesHow it works?ComparisonsUpcoming featureswww.ohnosequences.com www.era7bioinformatics.com
  27. 27. Motivation ComparisonsFeatures ConclusionsHow it works? Even in a not advantageous situationComparisons (not a NGS project and a very well annotated genome)Upcoming features We got in one round annotation step - ~95% of the NCBI protein coding genes - ~89% of the NCBI RNA genes - 419 new proteins detectedwww.ohnosequences.com www.era7bioinformatics.com
  28. 28. Motivation Upcoming featuresFeatures Improvements now focused on:How it works? - Overlapping solving phaseComparisons - Detection of very small proteinsUpcoming features And any new need we find using itwww.ohnosequences.com www.era7bioinformatics.com
  29. 29. Motivation Thanks:Features Oh no sequences! teamHow it works? Raquel Tobes: Bioinformatician, main advisorComparisons Pablo Pareja: Main developerUpcoming features Eduardo Pareja: Scientific advisor Eduardo Pareja-Tobes: Mathematician, advisor Carmen Torrecillas: Junior Bioinformatician Marina Manrique: Bioinformaticianwww.ohnosequences.com www.era7bioinformatics.com
  30. 30. Thanks for your attention!www.ohnosequences.com www.era7bioinformatics.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×