HMMER 3 & Community Profiling
Upcoming SlideShare
Loading in...5
×
 

HMMER 3 & Community Profiling

on

  • 2,890 views

This is my first lab presentation during my post-doc in Jonathan Eisen's lab. I discuss new features and changes with HMMER 3. Also, I discuss how I used the new version to identify PFAMs in all 80 ...

This is my first lab presentation during my post-doc in Jonathan Eisen's lab. I discuss new features and changes with HMMER 3. Also, I discuss how I used the new version to identify PFAMs in all 80 samples of the GOS metagenomic datasets with the hope of testing of "community profiling" may work.

Statistics

Views

Total Views
2,890
Slideshare-icon Views on SlideShare
2,880
Embed Views
10

Actions

Likes
0
Downloads
30
Comments
0

2 Embeds 10

http://www.slideshare.net 9
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    HMMER 3 & Community Profiling HMMER 3 & Community Profiling Presentation Transcript

    • HmmER 3 &Community Profiling
      Morgan Langille
      UC Davis
    • HMMER 3 – What’s new?
      Much Faster
      100 X HMMER 2
      ≈ BLAST
      More sensitive
    • What’s new?
      Alignment column confidence
      Each residue is given a posterior probability annotation
      * = 95-100%
      9= 85-95%
      8= 75-85%
      etc.
      fn3 2 saPenlsvsevtstsltlsWsppkdgggpitgYeveyqekgegeewqevtvprtttsvtltgLepgteYefrVqavngagegp 84
      saP ++ + ++ l ++W p + +gpi+gY++++++++++ + e+ vp+ s+ +++L++gt+Y++ + +n++gegp
      7LESS_DROME 439 SAPVIEHLMGLDDSHLAVHWHPGRFTNGPIEGYRLRLSSSEGNA-TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGEGP 520
      78999999999*****************************9998.**********************************9997 PP
    • What’s new?
      Sequence scores, not alignment scores
      scoring just a single best alignment can break down if it is a remote homolog
      scoring sequences by integrating over alignment uncertainty
    • Single Sequence Queries
      phmmer ≈ BLASTP
      Search a sequence against a sequence database.
      jackhmmer≈ PSI-BLAST
      Iteratively search a sequence against a sequence database.
      Internally they produce a profile HMM from the query sequence then run an HMM search
    • Small Changes
      hmmpfam -> hmmscan
      Search a sequence against a profile HMM database
      hmmcalibrate -> built into hmmbuild
      hmmpress
      Creates binary hmm files so hmmscan is faster
      Similar idea to formatting Blast db’s using formatdb
      New output format options
      --tblout(seq score, best domain score)
      --domtblout(seq score, all domain scores with coordinates)
      Gives a tab-delimited output without alignments
      1/5 file size of regular output
    • Upcoming changes
      Parallelization
      Multi-threaded, MPI (cluster), GPU
      Translated comparisons
      BLASTX, TBLASTN, TBLASTX
      More input sequence formats
      GenBank, EMBL, etc
      Clustal format
    • Problems/Issues
      hmmconvert
      Used to convert hmmer2 profiles into hmmer3 profiles
      Only converts file format
      Good: get hmmer3 speedup
      Bad: get hmmer2 sensitivity/specificity
      Should rebuild old HMMER2 HMMs using hmmbuild
    • Glocalvs local alignments
      Local
      Any portion of the HMM can align to any portion of the sequence
      Glocal
      The entire HMM is aligned to any portion of the sequence
      HMMER2
      Had both, but local was not as sensitive as glocal
      HMMER3
      Local was improved so that glocal was thought to be not needed (and was not included in HMMER3)
      However, some models do very poorly
      Short extremely diverse seed alignments such as zinc finger transcription factors may be missed
    • Community Profiling
    • Phylogenetic profiling
      Wu, et al., PLOS Genetics, 2005
      C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes
      Identified many hypothetical proteins that had the same profile as other sporulation proteins
    • Community Profiling
      KEGG
      COG
      Delong, et al., Science, 2006
    • Community Profiling
      Look across multiple metagenomic samples
      Gene families that have similar profiles may have similar function
      Similar to using co-expression to identify similar functioning genes
    • So what have I done?
      Downloaded the GOS peptide file
      41M sequences, 80 samples
      43GB -> 7GB, by removing extra information
      Split into ~100 smaller files
      Downloaded HMMER 3 Pfams (email request)
      Containing 11098 Pfams
      Ran hmmscan on genbeo
      4 days later
      12.5 M pfam predictions
      Some sequences contain >1 pfam
      9643 pfams
      Used “cluster” to group genes and samples
    • Results
      GOS Metagenomic Samples
      Red = above avg. number of pfams
      Green = below avg. number of pfams
      Have not normalized
      Number of sequences per sample
      For number of pfams
      Pfams
    • Example of phage Pfams clustering together
    • Future
      Community Profiling
      Include other (all) metagenomic samples
      Try to group Pfams by GO category to see how strong the correlation is between branch length and function
      Examine if some functionality categories are more easily predicted by this profiling strategy (i.e. HGTs)
      Identify novel gene families and sub-families
      Clustering genes, building HMMs, scanning, …repeat.
      Community profiling may help in annotation of these