SlideShare a Scribd company logo
HmmER 3 &Community Profiling Morgan Langille  UC Davis
HMMER 3 – What’s new? Much Faster 100 X HMMER 2 ≈ BLAST More sensitive
What’s new? Alignment column confidence Each residue is given a posterior probability annotation * = 95-100% 9= 85-95% 8= 75-85% etc. fn3 2 	      saPenlsvsevtstsltlsWsppkdgggpitgYeveyqekgegeewqevtvprtttsvtltgLepgteYefrVqavngagegp 84 saP   ++ +  ++ l ++W p +  +gpi+gY++++++++++  + e+ vp+ s+   +++L++gt+Y++ +  +n++gegp 7LESS_DROME 439   SAPVIEHLMGLDDSHLAVHWHPGRFTNGPIEGYRLRLSSSEGNA-TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGEGP 520 	      78999999999*****************************9998.**********************************9997 PP
What’s new? Sequence scores, not alignment scores scoring just a single best alignment can break down if it is a remote homolog scoring sequences by integrating over alignment uncertainty
Single Sequence Queries phmmer ≈ BLASTP Search a sequence against a sequence database.  jackhmmer≈ PSI-BLAST Iteratively search a sequence against a sequence database.  Internally they produce a profile HMM from the query sequence then run an HMM search
Small Changes hmmpfam -> hmmscan Search a sequence against a profile HMM database hmmcalibrate -> built into hmmbuild hmmpress Creates binary hmm files so hmmscan is faster Similar idea to formatting Blast db’s using formatdb New output format options --tblout(seq score, best domain score) --domtblout(seq score, all domain scores with coordinates) Gives a tab-delimited output without alignments 1/5 file size of regular output
Upcoming changes			 Parallelization Multi-threaded, MPI (cluster), GPU Translated comparisons BLASTX, TBLASTN, TBLASTX More input sequence formats GenBank, EMBL, etc Clustal format
Problems/Issues hmmconvert Used to convert hmmer2 profiles into hmmer3 profiles Only converts file format Good: get hmmer3 speedup  Bad: get hmmer2 sensitivity/specificity Should rebuild old HMMER2 HMMs using hmmbuild
Glocalvs local alignments Local Any portion of the HMM can align to any portion of the sequence Glocal The entire HMM is aligned to any portion of the sequence  HMMER2  Had both, but local was not as sensitive as glocal HMMER3 Local was improved so that glocal was thought to be not needed (and was not included in HMMER3) However, some models do very poorly  Short extremely diverse seed alignments such as zinc finger transcription factors may be missed
Community Profiling
Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
Community Profiling KEGG COG Delong, et al., Science, 2006
Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
So what have I done?	 Downloaded the GOS peptide file 41M sequences, 80 samples 43GB -> 7GB, by removing extra information Split into ~100 smaller files Downloaded HMMER 3 Pfams (email request) Containing 11098 Pfams Ran hmmscan on genbeo 4 days later 12.5 M pfam predictions Some sequences contain >1 pfam 9643 pfams Used “cluster” to group genes and samples
Results GOS Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
Example of phage Pfams clustering together
Future Community Profiling Include other (all) metagenomic samples Try to group Pfams by GO category to see how strong the correlation is between branch length and function Examine if some functionality categories  are more easily predicted by this profiling strategy (i.e. HGTs) Identify novel gene families and sub-families Clustering genes, building HMMs, scanning, …repeat.  Community profiling may help in annotation of these

More Related Content

What's hot

The efficiency of transgenesis by restriction enzyme mediated integration s...
The efficiency of transgenesis by restriction enzyme mediated integration   s...The efficiency of transgenesis by restriction enzyme mediated integration   s...
The efficiency of transgenesis by restriction enzyme mediated integration s...
Alexander Decker
 
Yeast 2 hybrid system ppt by meera qaiser
Yeast 2 hybrid system ppt by meera qaiserYeast 2 hybrid system ppt by meera qaiser
Yeast 2 hybrid system ppt by meera qaiser
Qaiser Sethi
 
Yeast two hybrid
Yeast two hybridYeast two hybrid
Yeast two hybrid
hina ojha
 
Yeast hybrid system
Yeast hybrid systemYeast hybrid system
Yeast hybrid system
LekshmiJohnson
 
Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
Giovanni Marco Dall'Olio
 
Yeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interactionYeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interaction
Maryam Shakeel
 
Yeast two hybrid system for Protein Protein Interaction Studies
Yeast two hybrid system for Protein Protein Interaction StudiesYeast two hybrid system for Protein Protein Interaction Studies
Yeast two hybrid system for Protein Protein Interaction Studies
ajithnandanam
 
Protein protein interactions-ppt
Protein protein interactions-pptProtein protein interactions-ppt
Protein protein interactions-ppt
Hamid Islampoor
 
Yeast Two Hybrid System
Yeast Two Hybrid SystemYeast Two Hybrid System
Yeast Two Hybrid System
Suby Mon Benny
 
2. Genetic Control
2. Genetic Control2. Genetic Control
2. Genetic Controlrossbiology
 
Yeast n hybrid
Yeast n hybridYeast n hybrid
Yeast n hybrid
somayeh hooshyar
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Y-h Taguchi
 
2. Absorption & Secretion Of Materials
2. Absorption & Secretion Of Materials2. Absorption & Secretion Of Materials
2. Absorption & Secretion Of Materialsrossbiology
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactionsPrianca12
 
Assessing the Role of Fic Protein
Assessing the Role of Fic Protein Assessing the Role of Fic Protein
Assessing the Role of Fic Protein Ashlynn Kokaska
 
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
Katie B
 

What's hot (20)

The efficiency of transgenesis by restriction enzyme mediated integration s...
The efficiency of transgenesis by restriction enzyme mediated integration   s...The efficiency of transgenesis by restriction enzyme mediated integration   s...
The efficiency of transgenesis by restriction enzyme mediated integration s...
 
Yeast 2 hybrid system ppt by meera qaiser
Yeast 2 hybrid system ppt by meera qaiserYeast 2 hybrid system ppt by meera qaiser
Yeast 2 hybrid system ppt by meera qaiser
 
Yeast two hybrid
Yeast two hybridYeast two hybrid
Yeast two hybrid
 
Yeast hybrid system
Yeast hybrid systemYeast hybrid system
Yeast hybrid system
 
Fehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal ClubFehrman Nat Gen 2014 - Journal Club
Fehrman Nat Gen 2014 - Journal Club
 
Yeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interactionYeast two hybrid system / protein-protein interaction
Yeast two hybrid system / protein-protein interaction
 
Yeast two hybrid system for Protein Protein Interaction Studies
Yeast two hybrid system for Protein Protein Interaction StudiesYeast two hybrid system for Protein Protein Interaction Studies
Yeast two hybrid system for Protein Protein Interaction Studies
 
Poster
PosterPoster
Poster
 
Protein protein interactions-ppt
Protein protein interactions-pptProtein protein interactions-ppt
Protein protein interactions-ppt
 
Yeast Two-Hybrid
Yeast Two-HybridYeast Two-Hybrid
Yeast Two-Hybrid
 
Yeast Two Hybrid System
Yeast Two Hybrid SystemYeast Two Hybrid System
Yeast Two Hybrid System
 
2. Genetic Control
2. Genetic Control2. Genetic Control
2. Genetic Control
 
Jncl schulz
Jncl schulzJncl schulz
Jncl schulz
 
Yeast n hybrid
Yeast n hybridYeast n hybrid
Yeast n hybrid
 
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
Discrimination of symbiotic/parasitic bacterial type III secretion system eff...
 
2. Absorption & Secretion Of Materials
2. Absorption & Secretion Of Materials2. Absorption & Secretion Of Materials
2. Absorption & Secretion Of Materials
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Assessing the Role of Fic Protein
Assessing the Role of Fic Protein Assessing the Role of Fic Protein
Assessing the Role of Fic Protein
 
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
A Brief Introduction to Mannose-Binding Lectin (MBL) and its Clinical Signifi...
 
Influence of micro-RNAs in Eukaryotic Gene Expression
Influence of micro-RNAs in Eukaryotic Gene ExpressionInfluence of micro-RNAs in Eukaryotic Gene Expression
Influence of micro-RNAs in Eukaryotic Gene Expression
 

Similar to HMMER 3 & Community Profiling

Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema
 
Fans
FansFans
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingGenomeInABottle
 
Genomic insight of__sperm_motility
Genomic insight of__sperm_motilityGenomic insight of__sperm_motility
Genomic insight of__sperm_motilitySanjay Kumar
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
Monica Munoz-Torres
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
Koppolu Ravi
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
ICGEB
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...
Div. of Neurogenet., NIG
 
Molecular Biology Assignment Help
Molecular Biology Assignment HelpMolecular Biology Assignment Help
Molecular Biology Assignment Help
Nursing Assignment Help
 
Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
Thomas Keane
 
Satkartar Khalsa's paper on hematopoiesis
Satkartar Khalsa's paper on hematopoiesis Satkartar Khalsa's paper on hematopoiesis
Satkartar Khalsa's paper on hematopoiesis Satkartar Khalsa
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
GenomeInABottle
 
Thesis Project Luke Morton 2016
Thesis Project Luke Morton 2016Thesis Project Luke Morton 2016
Thesis Project Luke Morton 2016Luke Morton
 
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdfONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
amzonknr
 
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdfONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
amzonknr
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-researchc.titus.brown
 
Help2
Help2Help2
Help2
YaCui
 

Similar to HMMER 3 & Community Profiling (20)

Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation OverviewPathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
Pathema Burkholderia Annotation Jamboree: Prokaryotic Annotation Overview
 
Fans
FansFans
Fans
 
Aug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencingAug2013 tumor normal whole genome sequencing
Aug2013 tumor normal whole genome sequencing
 
Genomic insight of__sperm_motility
Genomic insight of__sperm_motilityGenomic insight of__sperm_motility
Genomic insight of__sperm_motility
 
Levitan
LevitanLevitan
Levitan
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
Marker devt. workshop 27022012
Marker devt. workshop 27022012Marker devt. workshop 27022012
Marker devt. workshop 27022012
 
Protein Science 2004
Protein Science 2004Protein Science 2004
Protein Science 2004
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
 
The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...The introduction of supernova system: a vector system for single-cell labelin...
The introduction of supernova system: a vector system for single-cell labelin...
 
Molecular Biology Assignment Help
Molecular Biology Assignment HelpMolecular Biology Assignment Help
Molecular Biology Assignment Help
 
Multiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotationsMultiple mouse reference genomes and strain specific gene annotations
Multiple mouse reference genomes and strain specific gene annotations
 
Satkartar Khalsa's paper on hematopoiesis
Satkartar Khalsa's paper on hematopoiesis Satkartar Khalsa's paper on hematopoiesis
Satkartar Khalsa's paper on hematopoiesis
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Thesis Project Luke Morton 2016
Thesis Project Luke Morton 2016Thesis Project Luke Morton 2016
Thesis Project Luke Morton 2016
 
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdfONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE B.pdf
 
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdfONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
ONLY THE LAST QUESTION IS THE POINT OF POST. THE OTHER PAGES ARE BAC.pdf
 
Honors ~ Dna 1314
Honors ~ Dna 1314Honors ~ Dna 1314
Honors ~ Dna 1314
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
Help2
Help2Help2
Help2
 

More from Morgan Langille

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
Morgan Langille
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
Morgan Langille
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionMorgan Langille
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
Morgan Langille
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netMorgan Langille
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
Morgan Langille
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Morgan Langille
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
Morgan Langille
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
Morgan Langille
 

More from Morgan Langille (9)

GLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics WorkshopGLBIO/CCBC Metagenomics Workshop
GLBIO/CCBC Metagenomics Workshop
 
Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...Leveraging ancestral state reconstruction to infer community function from a ...
Leveraging ancestral state reconstruction to infer community function from a ...
 
Inferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic compositionInferring microbial community function from taxonomic composition
Inferring microbial community function from taxonomic composition
 
BioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific DataBioTorrents: A File Sharing Service for Scientific Data
BioTorrents: A File Sharing Service for Scientific Data
 
Unknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.netUnknown Genes, Community Profiling, & Biotorrents.net
Unknown Genes, Community Profiling, & Biotorrents.net
 
MicrobeDB Overview
MicrobeDB OverviewMicrobeDB Overview
MicrobeDB Overview
 
Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...Computational prediction and characterization of genomic islands: insights i...
Computational prediction and characterization of genomic islands: insights i...
 
Microbial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference ReviewMicrobial Genomics 2008 Conference Review
Microbial Genomics 2008 Conference Review
 
A graduate student's experience in bioinformatics
A graduate student's experience in bioinformaticsA graduate student's experience in bioinformatics
A graduate student's experience in bioinformatics
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 

HMMER 3 & Community Profiling

  • 1. HmmER 3 &Community Profiling Morgan Langille UC Davis
  • 2. HMMER 3 – What’s new? Much Faster 100 X HMMER 2 ≈ BLAST More sensitive
  • 3. What’s new? Alignment column confidence Each residue is given a posterior probability annotation * = 95-100% 9= 85-95% 8= 75-85% etc. fn3 2 saPenlsvsevtstsltlsWsppkdgggpitgYeveyqekgegeewqevtvprtttsvtltgLepgteYefrVqavngagegp 84 saP ++ + ++ l ++W p + +gpi+gY++++++++++ + e+ vp+ s+ +++L++gt+Y++ + +n++gegp 7LESS_DROME 439 SAPVIEHLMGLDDSHLAVHWHPGRFTNGPIEGYRLRLSSSEGNA-TSEQLVPAGRGSYIFSQLQAGTNYTLALSMINKQGEGP 520 78999999999*****************************9998.**********************************9997 PP
  • 4. What’s new? Sequence scores, not alignment scores scoring just a single best alignment can break down if it is a remote homolog scoring sequences by integrating over alignment uncertainty
  • 5. Single Sequence Queries phmmer ≈ BLASTP Search a sequence against a sequence database. jackhmmer≈ PSI-BLAST Iteratively search a sequence against a sequence database. Internally they produce a profile HMM from the query sequence then run an HMM search
  • 6. Small Changes hmmpfam -> hmmscan Search a sequence against a profile HMM database hmmcalibrate -> built into hmmbuild hmmpress Creates binary hmm files so hmmscan is faster Similar idea to formatting Blast db’s using formatdb New output format options --tblout(seq score, best domain score) --domtblout(seq score, all domain scores with coordinates) Gives a tab-delimited output without alignments 1/5 file size of regular output
  • 7. Upcoming changes Parallelization Multi-threaded, MPI (cluster), GPU Translated comparisons BLASTX, TBLASTN, TBLASTX More input sequence formats GenBank, EMBL, etc Clustal format
  • 8. Problems/Issues hmmconvert Used to convert hmmer2 profiles into hmmer3 profiles Only converts file format Good: get hmmer3 speedup Bad: get hmmer2 sensitivity/specificity Should rebuild old HMMER2 HMMs using hmmbuild
  • 9. Glocalvs local alignments Local Any portion of the HMM can align to any portion of the sequence Glocal The entire HMM is aligned to any portion of the sequence HMMER2 Had both, but local was not as sensitive as glocal HMMER3 Local was improved so that glocal was thought to be not needed (and was not included in HMMER3) However, some models do very poorly Short extremely diverse seed alignments such as zinc finger transcription factors may be missed
  • 11. Phylogenetic profiling Wu, et al., PLOS Genetics, 2005 C. hydrogenoformansidentified presence or absence of homologs in all other completely sequence genomes Identified many hypothetical proteins that had the same profile as other sporulation proteins
  • 12. Community Profiling KEGG COG Delong, et al., Science, 2006
  • 13. Community Profiling Look across multiple metagenomic samples Gene families that have similar profiles may have similar function Similar to using co-expression to identify similar functioning genes
  • 14. So what have I done? Downloaded the GOS peptide file 41M sequences, 80 samples 43GB -> 7GB, by removing extra information Split into ~100 smaller files Downloaded HMMER 3 Pfams (email request) Containing 11098 Pfams Ran hmmscan on genbeo 4 days later 12.5 M pfam predictions Some sequences contain >1 pfam 9643 pfams Used “cluster” to group genes and samples
  • 15. Results GOS Metagenomic Samples Red = above avg. number of pfams Green = below avg. number of pfams Have not normalized Number of sequences per sample For number of pfams Pfams
  • 16. Example of phage Pfams clustering together
  • 17. Future Community Profiling Include other (all) metagenomic samples Try to group Pfams by GO category to see how strong the correlation is between branch length and function Examine if some functionality categories are more easily predicted by this profiling strategy (i.e. HGTs) Identify novel gene families and sub-families Clustering genes, building HMMs, scanning, …repeat. Community profiling may help in annotation of these