HmmER 3 &Community Profiling<br />Morgan Langille<br /> UC Davis<br />
HMMER 3 – What’s new?<br />Much Faster<br />100 X HMMER 2<br />≈ BLAST<br />More sensitive<br />
What’s new?<br />Alignment column confidence<br />Each residue is given a posterior probability annotation<br />* = 95-100...
What’s new?<br />Sequence scores, not alignment scores<br />scoring just a single best alignment can break down if it is a...
Single Sequence Queries<br />phmmer ≈ BLASTP<br />Search a sequence against a sequence database. <br />jackhmmer≈ PSI-BLAS...
Small Changes<br />hmmpfam -&gt; hmmscan<br />Search a sequence against a profile HMM database<br />hmmcalibrate -&gt; bui...
Upcoming changes			<br />Parallelization<br />Multi-threaded, MPI (cluster), GPU<br />Translated comparisons<br />BLASTX, ...
Problems/Issues<br />hmmconvert<br />Used to convert hmmer2 profiles into hmmer3 profiles<br />Only converts file format<b...
Glocalvs local alignments<br />Local<br />Any portion of the HMM can align to any portion of the sequence<br />Glocal<br /...
Community Profiling<br />
Phylogenetic profiling<br />Wu, et al., PLOS Genetics, 2005<br />C. hydrogenoformansidentified presence or absence of homo...
Community Profiling<br />KEGG<br />COG<br />Delong, et al., Science, 2006<br />
Community Profiling<br />Look across multiple metagenomic samples<br />Gene families that have similar profiles may have s...
So what have I done?	<br />Downloaded the GOS peptide file<br />41M sequences, 80 samples<br />43GB -&gt; 7GB, by removing...
Results<br />GOS Metagenomic Samples<br />Red = above avg. number of pfams<br />Green = below avg. number of pfams<br />Ha...
Example of phage Pfams clustering together<br />
Future<br />Community Profiling<br />Include other (all) metagenomic samples<br />Try to group Pfams by GO category to see...
Upcoming SlideShare
Loading in …5
×

HMMER 3 & Community Profiling

2,219 views

Published on

This is my first lab presentation during my post-doc in Jonathan Eisen's lab. I discuss new features and changes with HMMER 3. Also, I discuss how I used the new version to identify PFAMs in all 80 samples of the GOS metagenomic datasets with the hope of testing of "community profiling" may work.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,219
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

HMMER 3 & Community Profiling

  1. 1. HmmER 3 &Community Profiling<br />Morgan Langille<br /> UC Davis<br />
  2. 2. HMMER 3 – What’s new?<br />Much Faster<br />100 X HMMER 2<br />≈ BLAST<br />More sensitive<br />
  3. 3. What’s new?<br />Alignment column confidence<br />Each residue is given a posterior probability annotation<br />* = 95-100%<br />9= 85-95%<br />8= 75-85%<br />etc.<br />fn3 2 saPenlsvsevtstsltlsWsppkdgggpitgYeveyqekgegeewqevtvprtttsvtltgLepgteYefrVqavngagegp 84<br />saP ++ + ++ l ++W p + +gpi+gY++++++++++ + e+ vp+ s+ +++L++gt+Y++ + +n++gegp<br />7LESS_DROME 439 SAPVIEHLMGLDDSHLAVHWHPGRFTNGPIEGYRLRLSSSEGNA-TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGEGP 520<br /> 78999999999*****************************9998.**********************************9997 PP<br />
  4. 4. What’s new?<br />Sequence scores, not alignment scores<br />scoring just a single best alignment can break down if it is a remote homolog<br />scoring sequences by integrating over alignment uncertainty<br />
  5. 5. Single Sequence Queries<br />phmmer ≈ BLASTP<br />Search a sequence against a sequence database. <br />jackhmmer≈ PSI-BLAST<br />Iteratively search a sequence against a sequence database. <br />Internally they produce a profile HMM from the query sequence then run an HMM search<br />
  6. 6. Small Changes<br />hmmpfam -&gt; hmmscan<br />Search a sequence against a profile HMM database<br />hmmcalibrate -&gt; built into hmmbuild<br />hmmpress<br />Creates binary hmm files so hmmscan is faster<br />Similar idea to formatting Blast db’s using formatdb<br />New output format options<br />--tblout(seq score, best domain score)<br />--domtblout(seq score, all domain scores with coordinates)<br />Gives a tab-delimited output without alignments<br />1/5 file size of regular output<br />
  7. 7. Upcoming changes <br />Parallelization<br />Multi-threaded, MPI (cluster), GPU<br />Translated comparisons<br />BLASTX, TBLASTN, TBLASTX<br />More input sequence formats<br />GenBank, EMBL, etc<br />Clustal format <br />
  8. 8. Problems/Issues<br />hmmconvert<br />Used to convert hmmer2 profiles into hmmer3 profiles<br />Only converts file format<br />Good: get hmmer3 speedup <br />Bad: get hmmer2 sensitivity/specificity<br />Should rebuild old HMMER2 HMMs using hmmbuild<br />
  9. 9. Glocalvs local alignments<br />Local<br />Any portion of the HMM can align to any portion of the sequence<br />Glocal<br />The entire HMM is aligned to any portion of the sequence <br />HMMER2 <br />Had both, but local was not as sensitive as glocal<br />HMMER3<br />Local was improved so that glocal was thought to be not needed (and was not included in HMMER3)<br />However, some models do very poorly <br />Short extremely diverse seed alignments such as zinc finger transcription factors may be missed<br />
  10. 10. Community Profiling<br />
  11. 11. Phylogenetic profiling<br />Wu, et al., PLOS Genetics, 2005<br />C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes<br />Identified many hypothetical proteins that had the same profile as other sporulation proteins<br />
  12. 12. Community Profiling<br />KEGG<br />COG<br />Delong, et al., Science, 2006<br />
  13. 13. Community Profiling<br />Look across multiple metagenomic samples<br />Gene families that have similar profiles may have similar function<br />Similar to using co-expression to identify similar functioning genes<br />
  14. 14. So what have I done? <br />Downloaded the GOS peptide file<br />41M sequences, 80 samples<br />43GB -&gt; 7GB, by removing extra information<br />Split into ~100 smaller files<br />Downloaded HMMER 3 Pfams (email request)<br />Containing 11098 Pfams<br />Ran hmmscan on genbeo<br />4 days later<br />12.5 M pfam predictions<br />Some sequences contain &gt;1 pfam<br />9643 pfams<br />Used “cluster” to group genes and samples<br />
  15. 15. Results<br />GOS Metagenomic Samples<br />Red = above avg. number of pfams<br />Green = below avg. number of pfams<br />Have not normalized<br />Number of sequences per sample<br />For number of pfams<br />Pfams<br />
  16. 16. Example of phage Pfams clustering together<br />
  17. 17. Future<br />Community Profiling<br />Include other (all) metagenomic samples<br />Try to group Pfams by GO category to see how strong the correlation is between branch length and function<br />Examine if some functionality categories are more easily predicted by this profiling strategy (i.e. HGTs)<br />Identify novel gene families and sub-families<br />Clustering genes, building HMMs, scanning, …repeat. <br />Community profiling may help in annotation of these<br />

×