I have send you the SOrt-ITEMS pdf by mail. Imagine a situation where reads originate from a new genera very related to Staphylococcus, (but not Staphylococcus itself). Since sequences corresponding to this genera are absent in the database, hits will be shown from Staph and Bacilli. If i assign the read to Staph based on the hits, the assignment is wrong. Ideally it should have been done at a higher taxonomic level. Do you get the point. We have taken all these scenarios in to consideration before designing SOrt-ITEMS.
Request you to go through the pdf I had sent to you by mail and provide feedback or questions if any.?
Hi, I am actually looking for a way how i could modify that LCA algorithm to look at the score lengths. I’ll be working with Matthews on this and i guess i get your argument.
So for example, if the blast report from a read has high scoring hits to staphylococcus aureus, staphylococcus sanguinis, and staph epidermis, the last common ancestor would be ’staphylococcus’. We are interested in modifying the calculation to look at the distribution of score strengths and assignments to see if we can make a better assignment, for example if we get 15 reads that hit to the ’staphylococcus’ genera, and one read hit to class ’bacilli’, we could probably include the bacilli hit to the staphylococus genera. because the staphylococcus genera is in the bacilli class.
Could you please advise and give that paper that you have talked about which gives a better taxonomic assignment?? Thanks
The Lowest Common Ancestor approach you wish to incorporate in your pipeline results in a lot of false positives especially in cases where the input sequences arise from new organisms (which is generally the case with metagenomic samples). The assignments also tend to loose specificity due to the LCA approach.
Please refer to the paper mentioned below for a better taxonomic assignment approach (algorithm)
SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences.
Pathogen Profiling Pipeline
A Metagenomics Tool for Rapid
Identification of Pathogens from Clinical
Specimens
Tom Matthews
National Microbiology Laboratory
Public Health Agency of Canada
thomas_matthews@phac-aspc.gc.ca
June 27, 2009 Pathogen Profiling Pipeline 1
M3 SIG – ISMB/ECCB 2009
Introduction
● With novel/emerging disease classical
pathogen identification may not always produce
results
● Advances in next-gen sequencing technology
● Characterize samples at genomic level
● Pathogen Profiling Pipeline
● Bioinformatics pipeline
● Analysis of host and microbial nucleic acids
June 27, 2009 Pathogen Profiling Pipeline 2
M3 SIG – ISMB/ECCB 2009
Features
● Nucleotide and protein BLAST analysis
● Unbiased analysis of input reads
● Clustered execution
● Web front-end
● Custom analysis pipelines
● Easily viewed results
June 27, 2009 Pathogen Profiling Pipeline 3
M3 SIG – ISMB/ECCB 2009
Filtering Overview
● BLAST analysis performed against reference
sequence database
● Assigns hits according to cut-off criteria
● Calculate equivalent hits
● Clustered BLAST and filtering
June 27, 2009 Pathogen Profiling Pipeline 4
M3 SIG – ISMB/ECCB 2009
Last Common Ancestor Estimation
● Uses equivalent hits for LCA calculation
● User specifies equivalent hit percentage cutoff
● NCBI taxonomy database for ancestor lookup
● Walks up taxonomy tree to find lowest intersection of
all leaf nodes
● Unbiased approach
Vaccinia
Variola
Orthopoxvirus
Taterapox
Camelpox
June 27, 2009 Pathogen Profiling Pipeline 5
M3 SIG – ISMB/ECCB 2009
Filtering Outputs
● Hits – High scoring reads passing filtering
values
● Equivalent Hits – BLAST hits matching to within
an assigned percentage of the top hit's bitscore
● Last Common Ancestors – Calculated
(estimated) LCA of all the equivalent hits
● Unassigned – Passed to the next pipeline step
June 27, 2009 Pathogen Profiling Pipeline 6
M3 SIG – ISMB/ECCB 2009
Example Analysis Method
Sample
reads
● BLAST reads against host
database Host
genome
BLAST and
Filtering
● Remove host reads Non-host
reads
● BLAST unassigned against
reference database Viral BLAST and
Filtering
genome
● Filter hits vs. unassigned Bacterial BLAST and
genome Filtering
● Repeat... Pool
results
Protozoan BLAST and
● Post analysis genome Filtering
Unique
Fungal BLAST and
organisms
genome Filtering
In sample
June 27, 2009 Pathogen Profiling Pipeline 7
M3 SIG – ISMB/ECCB 2009
Pipeline Construction
June 27, 2009 Pathogen Profiling Pipeline 8
M3 SIG – ISMB/ECCB 2009
Pipeline Construction
June 27, 2009 Pathogen Profiling Pipeline 9
M3 SIG – ISMB/ECCB 2009
Pipeline Construction
June 27, 2009 Pathogen Profiling Pipeline 10
M3 SIG – ISMB/ECCB 2009
Pipeline Construction
June 27, 2009 Pathogen Profiling Pipeline 11
M3 SIG – ISMB/ECCB 2009
Pipeline Construction
June 27, 2009 Pathogen Profiling Pipeline 12
M3 SIG – ISMB/ECCB 2009
Pipeline Execution
● Custom execution manager
● Computes dependencies and monitors running
jobs
● Distribute jobs across Linux cluster
● Facilitates unattended clustered executions
June 27, 2009 Pathogen Profiling Pipeline 13
M3 SIG – ISMB/ECCB 2009
Reports
June 27, 2009 Pathogen Profiling Pipeline 14
M3 SIG – ISMB/ECCB 2009
Drill Down Reports
June 27, 2009 Pathogen Profiling Pipeline 15
M3 SIG – ISMB/ECCB 2009
Abundance View
● Displays abundance of taxonomic hits
June 27, 2009 Pathogen Profiling Pipeline 16
M3 SIG – ISMB/ECCB 2009
Example Run
● Mouth swab input samples
● Two pools:
● Samples spiked with Vaccinia and Influenza A
● Background reference sample
June 27, 2009 Pathogen Profiling Pipeline 17
M3 SIG – ISMB/ECCB 2009
Example Run
June 27, 2009 Pathogen Profiling Pipeline 18
M3 SIG – ISMB/ECCB 2009
Example run
June 27, 2009 Pathogen Profiling Pipeline 19
M3 SIG – ISMB/ECCB 2009
Example Run
June 27, 2009 Pathogen Profiling Pipeline 20
M3 SIG – ISMB/ECCB 2009
Example Run
June 27, 2009 Pathogen Profiling Pipeline 21
M3 SIG – ISMB/ECCB 2009
Wrap-up
● Unbiased analysis of input reads
● Custom analysis pipelines
● Last common ancestor calculation
● Clustered execution
● Multiple report views
● Exportable results
June 27, 2009 Pathogen Profiling Pipeline 22
M3 SIG – ISMB/ECCB 2009
Acknowledgements
● Gary Van Domselaar
● Morag Graham
● Shaun Tyler
● Heather Kent
● Kim Melnychuk
● Christine Bonner
● Geoff Peters
● Philip Mabon
June 27, 2009 Pathogen Profiling Pipeline 23
M3 SIG – ISMB/ECCB 2009
3 comments
Comments 1 - 3 of 3 previous next Post a comment