Pathogen Profiling Pipeline

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

3 comments

Comments 1 - 3 of 3 previous next Post a comment

  • + monzoor monzoor 4 months ago
    I have send you the SOrt-ITEMS pdf by mail.
    Imagine a situation where reads originate from a new genera very related to Staphylococcus, (but not Staphylococcus itself). Since sequences corresponding to this genera are absent in the database, hits will be shown from Staph and Bacilli. If i assign the read to Staph based on the hits, the assignment is wrong. Ideally it should have been done at a higher taxonomic level. Do you get the point.
    We have taken all these scenarios in to consideration before designing SOrt-ITEMS.

    Request you to go through the pdf I had sent to you by mail and provide feedback or questions if any.?

    Regards
    Monzoor
  • + integer integer 4 months ago
    Hi, I am actually looking for a way how i could modify that LCA algorithm to look at the score lengths. I’ll be working with Matthews on this and i guess i get your argument.

    So for example, if the blast report from a read has high scoring hits to staphylococcus aureus, staphylococcus sanguinis, and staph epidermis, the last common ancestor would be ’staphylococcus’. We are interested in modifying the calculation to look at the distribution of score strengths and assignments to see if we can make a better assignment, for example if we get 15 reads that hit to the ’staphylococcus’ genera, and one read hit to class ’bacilli’, we could probably include the bacilli hit to the staphylococus genera. because the staphylococcus genera is in the bacilli class.

    Could you please advise and give that paper that you have talked about which gives a better taxonomic assignment?? Thanks
  • + monzoor monzoor 4 months ago
    The Lowest Common Ancestor approach you wish to incorporate in your pipeline results in a lot of false positives especially in cases where the input sequences arise from new organisms (which is generally the case with metagenomic samples). The assignments also tend to loose specificity due to the LCA approach.

    Please refer to the paper mentioned below for a better taxonomic assignment approach (algorithm)

    SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.

    This software is available for free download at
    http://metagenomics.atc.tcs.com/binning/SOrt-ITEMS
Post a comment
Embed Video
Edit your comment Cancel

Favorites, Groups & Events

Pathogen Profiling Pipeline - Presentation Transcript

  1. Pathogen Profiling Pipeline A Metagenomics Tool for Rapid Identification of Pathogens from Clinical Specimens Tom Matthews National Microbiology Laboratory Public Health Agency of Canada thomas_matthews@phac-aspc.gc.ca June 27, 2009 Pathogen Profiling Pipeline 1 M3 SIG – ISMB/ECCB 2009
  2. Introduction ● With novel/emerging disease classical pathogen identification may not always produce results ● Advances in next-gen sequencing technology ● Characterize samples at genomic level ● Pathogen Profiling Pipeline ● Bioinformatics pipeline ● Analysis of host and microbial nucleic acids June 27, 2009 Pathogen Profiling Pipeline 2 M3 SIG – ISMB/ECCB 2009
  3. Features ● Nucleotide and protein BLAST analysis ● Unbiased analysis of input reads ● Clustered execution ● Web front-end ● Custom analysis pipelines ● Easily viewed results June 27, 2009 Pathogen Profiling Pipeline 3 M3 SIG – ISMB/ECCB 2009
  4. Filtering Overview ● BLAST analysis performed against reference sequence database ● Assigns hits according to cut-off criteria ● Calculate equivalent hits ● Clustered BLAST and filtering June 27, 2009 Pathogen Profiling Pipeline 4 M3 SIG – ISMB/ECCB 2009
  5. Last Common Ancestor Estimation ● Uses equivalent hits for LCA calculation ● User specifies equivalent hit percentage cutoff ● NCBI taxonomy database for ancestor lookup ● Walks up taxonomy tree to find lowest intersection of all leaf nodes ● Unbiased approach Vaccinia Variola Orthopoxvirus Taterapox Camelpox June 27, 2009 Pathogen Profiling Pipeline 5 M3 SIG – ISMB/ECCB 2009
  6. Filtering Outputs ● Hits – High scoring reads passing filtering values ● Equivalent Hits – BLAST hits matching to within an assigned percentage of the top hit's bitscore ● Last Common Ancestors – Calculated (estimated) LCA of all the equivalent hits ● Unassigned – Passed to the next pipeline step June 27, 2009 Pathogen Profiling Pipeline 6 M3 SIG – ISMB/ECCB 2009
  7. Example Analysis Method Sample reads ● BLAST reads against host database Host genome BLAST and Filtering ● Remove host reads Non-host reads ● BLAST unassigned against reference database Viral BLAST and Filtering genome ● Filter hits vs. unassigned Bacterial BLAST and genome Filtering ● Repeat... Pool results Protozoan BLAST and ● Post analysis genome Filtering Unique Fungal BLAST and organisms genome Filtering In sample June 27, 2009 Pathogen Profiling Pipeline 7 M3 SIG – ISMB/ECCB 2009
  8. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 8 M3 SIG – ISMB/ECCB 2009
  9. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 9 M3 SIG – ISMB/ECCB 2009
  10. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 10 M3 SIG – ISMB/ECCB 2009
  11. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 11 M3 SIG – ISMB/ECCB 2009
  12. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 12 M3 SIG – ISMB/ECCB 2009
  13. Pipeline Execution ● Custom execution manager ● Computes dependencies and monitors running jobs ● Distribute jobs across Linux cluster ● Facilitates unattended clustered executions June 27, 2009 Pathogen Profiling Pipeline 13 M3 SIG – ISMB/ECCB 2009
  14. Reports June 27, 2009 Pathogen Profiling Pipeline 14 M3 SIG – ISMB/ECCB 2009
  15. Drill Down Reports June 27, 2009 Pathogen Profiling Pipeline 15 M3 SIG – ISMB/ECCB 2009
  16. Abundance View ● Displays abundance of taxonomic hits June 27, 2009 Pathogen Profiling Pipeline 16 M3 SIG – ISMB/ECCB 2009
  17. Example Run ● Mouth swab input samples ● Two pools: ● Samples spiked with Vaccinia and Influenza A ● Background reference sample June 27, 2009 Pathogen Profiling Pipeline 17 M3 SIG – ISMB/ECCB 2009
  18. Example Run June 27, 2009 Pathogen Profiling Pipeline 18 M3 SIG – ISMB/ECCB 2009
  19. Example run June 27, 2009 Pathogen Profiling Pipeline 19 M3 SIG – ISMB/ECCB 2009
  20. Example Run June 27, 2009 Pathogen Profiling Pipeline 20 M3 SIG – ISMB/ECCB 2009
  21. Example Run June 27, 2009 Pathogen Profiling Pipeline 21 M3 SIG – ISMB/ECCB 2009
  22. Wrap-up ● Unbiased analysis of input reads ● Custom analysis pipelines ● Last common ancestor calculation ● Clustered execution ● Multiple report views ● Exportable results June 27, 2009 Pathogen Profiling Pipeline 22 M3 SIG – ISMB/ECCB 2009
  23. Acknowledgements ● Gary Van Domselaar ● Morag Graham ● Shaun Tyler ● Heather Kent ● Kim Melnychuk ● Christine Bonner ● Geoff Peters ● Philip Mabon June 27, 2009 Pathogen Profiling Pipeline 23 M3 SIG – ISMB/ECCB 2009
SlideShare Zeitgeist 2009

+ tom14tom14 Nominate

custom

309 views, 0 favs, 1 embeds more stats

Metagenomc sample analysis pipeline.
Talk at M3 SIG more

More info about this document

© All Rights Reserved

Go to text version

  • Total Views 309
    • 294 on SlideShare
    • 15 from embeds
  • Comments 3
  • Favorites 0
  • Downloads 4
Most viewed embeds
  • 15 views on http://bytesizebio.net

more

All embeds
  • 15 views on http://bytesizebio.net

less

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

Cancel
File a copyright complaint
Having problems? Go to our helpdesk?

Categories