Pathogen Profiling Pipeline

2,972 views

Published on

Metagenomc sample analysis pipeline.
Talk at M3 SIG at ISMB 2009 in Stockholm Sweden.

Published in: Technology, Business
2 Comments
0 Likes
Statistics
Notes
  • I have send you the SOrt-ITEMS pdf by mail.
    Imagine a situation where reads originate from a new genera very related to Staphylococcus, (but not Staphylococcus itself). Since sequences corresponding to this genera are absent in the database, hits will be shown from Staph and Bacilli. If i assign the read to Staph based on the hits, the assignment is wrong. Ideally it should have been done at a higher taxonomic level. Do you get the point.
    We have taken all these scenarios in to consideration before designing SOrt-ITEMS.

    Request you to go through the pdf I had sent to you by mail and provide feedback or questions if any.?

    Regards
    Monzoor
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • The Lowest Common Ancestor approach you wish to incorporate in your pipeline results in a lot of false positives especially in cases where the input sequences arise from new organisms (which is generally the case with metagenomic samples). The assignments also tend to loose specificity due to the LCA approach.

    Please refer to the paper mentioned below for a better taxonomic assignment approach (algorithm)

    SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.

    This software is available for free download at
    http://metagenomics.atc.tcs.com/binning/SOrt-ITEMS
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
2,972
On SlideShare
0
From Embeds
0
Number of Embeds
1,770
Actions
Shares
0
Downloads
15
Comments
2
Likes
0
Embeds 0
No embeds

No notes for slide

Pathogen Profiling Pipeline

  1. 1. Pathogen Profiling Pipeline A Metagenomics Tool for Rapid Identification of Pathogens from Clinical Specimens Tom Matthews National Microbiology Laboratory Public Health Agency of Canada thomas_matthews@phac-aspc.gc.ca June 27, 2009 Pathogen Profiling Pipeline 1 M3 SIG – ISMB/ECCB 2009
  2. 2. Introduction ● With novel/emerging disease classical pathogen identification may not always produce results ● Advances in next-gen sequencing technology ● Characterize samples at genomic level ● Pathogen Profiling Pipeline ● Bioinformatics pipeline ● Analysis of host and microbial nucleic acids June 27, 2009 Pathogen Profiling Pipeline 2 M3 SIG – ISMB/ECCB 2009
  3. 3. Features ● Nucleotide and protein BLAST analysis ● Unbiased analysis of input reads ● Clustered execution ● Web front-end ● Custom analysis pipelines ● Easily viewed results June 27, 2009 Pathogen Profiling Pipeline 3 M3 SIG – ISMB/ECCB 2009
  4. 4. Filtering Overview ● BLAST analysis performed against reference sequence database ● Assigns hits according to cut-off criteria ● Calculate equivalent hits ● Clustered BLAST and filtering June 27, 2009 Pathogen Profiling Pipeline 4 M3 SIG – ISMB/ECCB 2009
  5. 5. Last Common Ancestor Estimation ● Uses equivalent hits for LCA calculation ● User specifies equivalent hit percentage cutoff ● NCBI taxonomy database for ancestor lookup ● Walks up taxonomy tree to find lowest intersection of all leaf nodes ● Unbiased approach Vaccinia Variola Orthopoxvirus Taterapox Camelpox June 27, 2009 Pathogen Profiling Pipeline 5 M3 SIG – ISMB/ECCB 2009
  6. 6. Filtering Outputs ● Hits – High scoring reads passing filtering values ● Equivalent Hits – BLAST hits matching to within an assigned percentage of the top hit's bitscore ● Last Common Ancestors – Calculated (estimated) LCA of all the equivalent hits ● Unassigned – Passed to the next pipeline step June 27, 2009 Pathogen Profiling Pipeline 6 M3 SIG – ISMB/ECCB 2009
  7. 7. Example Analysis Method Sample reads ● BLAST reads against host database Host genome BLAST and Filtering ● Remove host reads Non-host reads ● BLAST unassigned against reference database Viral BLAST and Filtering genome ● Filter hits vs. unassigned Bacterial BLAST and genome Filtering ● Repeat... Pool results Protozoan BLAST and ● Post analysis genome Filtering Unique Fungal BLAST and organisms genome Filtering In sample June 27, 2009 Pathogen Profiling Pipeline 7 M3 SIG – ISMB/ECCB 2009
  8. 8. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 8 M3 SIG – ISMB/ECCB 2009
  9. 9. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 9 M3 SIG – ISMB/ECCB 2009
  10. 10. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 10 M3 SIG – ISMB/ECCB 2009
  11. 11. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 11 M3 SIG – ISMB/ECCB 2009
  12. 12. Pipeline Construction June 27, 2009 Pathogen Profiling Pipeline 12 M3 SIG – ISMB/ECCB 2009
  13. 13. Pipeline Execution ● Custom execution manager ● Computes dependencies and monitors running jobs ● Distribute jobs across Linux cluster ● Facilitates unattended clustered executions June 27, 2009 Pathogen Profiling Pipeline 13 M3 SIG – ISMB/ECCB 2009
  14. 14. Reports June 27, 2009 Pathogen Profiling Pipeline 14 M3 SIG – ISMB/ECCB 2009
  15. 15. Drill Down Reports June 27, 2009 Pathogen Profiling Pipeline 15 M3 SIG – ISMB/ECCB 2009
  16. 16. Abundance View ● Displays abundance of taxonomic hits June 27, 2009 Pathogen Profiling Pipeline 16 M3 SIG – ISMB/ECCB 2009
  17. 17. Example Run ● Mouth swab input samples ● Two pools: ● Samples spiked with Vaccinia and Influenza A ● Background reference sample June 27, 2009 Pathogen Profiling Pipeline 17 M3 SIG – ISMB/ECCB 2009
  18. 18. Example Run June 27, 2009 Pathogen Profiling Pipeline 18 M3 SIG – ISMB/ECCB 2009
  19. 19. Example run June 27, 2009 Pathogen Profiling Pipeline 19 M3 SIG – ISMB/ECCB 2009
  20. 20. Example Run June 27, 2009 Pathogen Profiling Pipeline 20 M3 SIG – ISMB/ECCB 2009
  21. 21. Example Run June 27, 2009 Pathogen Profiling Pipeline 21 M3 SIG – ISMB/ECCB 2009
  22. 22. Wrap-up ● Unbiased analysis of input reads ● Custom analysis pipelines ● Last common ancestor calculation ● Clustered execution ● Multiple report views ● Exportable results June 27, 2009 Pathogen Profiling Pipeline 22 M3 SIG – ISMB/ECCB 2009
  23. 23. Acknowledgements ● Gary Van Domselaar ● Morag Graham ● Shaun Tyler ● Heather Kent ● Kim Melnychuk ● Christine Bonner ● Geoff Peters ● Philip Mabon June 27, 2009 Pathogen Profiling Pipeline 23 M3 SIG – ISMB/ECCB 2009

×