JGI_HMMER.pptx

William Arndt
Increased
HMMER3
performance
on Genepool
- 1 -
August 2, 2016

The result first:
A production sized use case: HMMER3 hmmsearch tool
searching Pfam 29.0 database (16k models) against
swissprot database (550k sequences):
- 2 -
1 thread: 25 hours
NEW hmmsearch, 32 threads + HT: 27 minutes
32 threads, sharded input files: 1 hour
4 threads: 8 hours
32 threads: 8 hours

Protein Homology Search
Start with a multiple sequence alignment describing
an interesting protein domain, profile, or motif.
A MSA is used to build a Hidden Markov Model
through which HMMER3 can search protein
sequences for matches with statistical significance.
Compare millions of sequences against tens of
thousands of protein HMMs. Use the results for
annotation.
- 4 -

HMMER3 filter pipeline
The overwhelming majority of sequences don’t
match. Speed is gained by discarding a miss as soon as
possible.
• Filtering Pipeline:
– Multiple Segment Viterbi filter:
• High scoring diagonals, 2% pass, uses 25% of cpu time
– Viterbi filter:
• optimal alignment with indels, 5% pass, uses 15% of cpu time
– Forward/Backward filter:
• combined score of all alignments, 1% pass, uses 5% of cpu time
– Hit processing and output (30% of time)
- 5 -

HMMER3 output
Query: 1-cysPrx_C [M=40]
Accession: PF10417.6
Description: C-terminal domain of 1-Cys peroxiredoxin
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp N Sequence Description
------- ------ ----- ------- ------ ----- ---- -- -------- -----------
5.8e-18 69.2 1.8 1.1e-17 68.4 1.8 1.5 1 sp|O67024|TDXH_AQUAE Peroxiredo...
3.4e-15 60.4 0.0 9e-15 59.0 0.0 1.8 1 sp|Q9Y7F0|TSA1_CANAL Peroxiredo...
7.9e-14 56.0 0.0 1.5e-13 55.1 0.0 1.5 1 sp|Q26695|TDX_TRYBR Thioredoxi...
...
- 6 -
Domain annotation for each sequence:
>> sp|O67024|TDXH_AQUAE Peroxiredoxin OS=Aquifex aeolicus (strain VF5) GN=aq_858 PE=3 SV=1
# score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc
--- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ----
1 ! 68.4 1.8 2.9e-21 1.1e-17 1 40 [] 160 209 .. 160 209 .. 0.99
>> sp|Q9Y7F0|TSA1_CANAL Peroxiredoxin TSA1 OS=Candida albicans (strain SC5314 / ATCC MYA-2876) ...
--- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ----
1 ! 59.0 0.0 2.4e-18 9e-15 1 40 [] 158 193 .. 158 193 .. 0.98
>> sp|Q26695|TDX_TRYBR Thioredoxin peroxidase OS=Trypanosoma brucei rhodesiense PE=2 SV=1
--- ------ ----- --------- --------- ------- ------- ------- ------- ------- ------- ----
1 ! 55.1 0.0 4e-17 1.5e-13 1 39 [. 162 196 .. 162 197 .. 0.97
...

Why HMMER3 is
inefficient on Genepool
- 7 -

HMMER3 memory scrooge
HMMER3 was engineered to be as portable as possible.
Running on a 2010 era desktop or laptop requires a
much smaller memory footprint than available in an
HPC environment.
Instead of reading a fasta file once and using memory
to store it, HMMER3 goes back to disk over and over
again. The overhead limits the rate data can be
prepared. That rate is slower than the rate multiple
threads can consume it. Any more than 4 worker
threads will sit idle waiting for data.
- 8 -

Counting I/O instructions
- 9 -
sqascii_Read() and header_fasta() are the sequence
reading functions. Standard hmmsearch spends 25% of its
compute reading the same sequence file over and over
again.

Utilization of Genepool nodes
• Core Utilization
– Genepool has nodes with 16 or 32 cores
– HMMER3 can use no more than 4 cores efficiently
– All threads wait for stragglers after every model
– Mitigation options include:
• Ignore the problem
• sharing a node with -pe pe_slots 4 + --cpu 3
• Shard input files, run multiple hmmsearch on one node, then
combine output
• Memory Utilization
– All Genepool nodes have more than 100GB of memory
– HMMER3 won’t use 95% of that unless you do something
absurd like search TITIN against its own model.
- 10 -

Buffer the I/O data and reuse it
Store several models and their results in a memory
buffer such that each read sequence can be used to
search multiple models.
This puts a denominator under the number of
sequence related disk access calls needed; 25% of
cpu instructions are reduced to <1% this way.
Two buffers can alternate; I/O performed on one and
computation on the other. If I/O finishes early that
thread converts itself to a worker.
- 12 -

Original HMMER3 thread behavior
- 13 -

How to use custom hmmsearch
- 15 -
warndt@genepool13:~$ module load hmmer/3.1b2-opt
warndt@genepool13:~$ hpc_hmmsearch -h
# hpc_hmmsearch :: search profile(s) against a sequence database, custom
modified for improved thread performance
# HMMER 3.1b2 (February 2015); http://hmmer.org/
...
Input buffer and thread control:
--seq_buffer <n> : set # of sequences per thread buffer [200000] (n>=1)
--hmm_buffer <n> : set # of hmms per thread hmm buffer [500] (n>=1)
--cpu <n> : set # of threads [1] (n>=1)
...

HMMER3 on other NERSC systems
Cori phase I hardware is functionally identical
(Haswell processors with 128GB memory) to -pe
pe_slots 32 nodes available on Genepool. No
custom HMMER3 module on Cori yet, but that can
be fixed in 5 minutes when someone wants it.
HMMER3 runs on Cori phase II hardware (Knights
Landing many-core architecture) but not as well as
on phase I. My current best KNL time for swissprot
against Pfam is 38 minutes.
- 17 -

hmmscan modification
JGI usage of hmmscan is approximately an order of
magnitude less than hmmsearch.
The design is very similar to hmmsearch. Conversion
would be straightforward and take approximately a week.
As soon as someone expresses interest in running high
volume hmmscan, I’ll complete and make it available.
- 18 -

Upgrading vector code
The 6 year old single instruction multiple data (SIMD)
instructions in the HMMER3 pipeline do not run well on
KNL hardware.
I am currently working on new filters which will use more
modern vector instructions and will run more efficiently
on the phase II machine.
- 19 -

HMMER4 is coming
• Sean Eddy has been actively developing a new major
version of HMMER.
• The components I am hacking for better performance
today will be completely replaced in the future with
theoretically superior algorithms.
• It won’t be available for at least a year, and probably
more like two or three.
• If I’m still around, I’ll help everyone transition to the
new application.
- 20 -

HMMER3 translated search
Translated frameshift aware HMMER3 search is
currently in development. An alpha version is
available and anyone interested is welcome to give it
a try and provide feedback.
/global/homes/w/warndt/edison-t-hmmer/hmmer/src/phmmert
/global/homes/w/warndt/edison-t-hmmer/hmmer/src/nhmmscant
- 21 -

National Energy Research Scientific Computing
Center
- 22 -

JGI_HMMER.pptx

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to JGI_HMMER.pptx

Similar to JGI_HMMER.pptx (20)

JGI_HMMER.pptx