Poster presented at PEAC21.
The poster contains the complete scaling plots for both unweighted and weighted normalized Unifrac compute for sample sizes ranging from 1k to 307k on both GPUs and CPUs.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Comparing GPU effectiveness for Unifrac distance compute
1. Comparing GPU effectiveness for Unifrac distance compute
Consumer-grade NVIDIA GPU
Unweighted Unifrac Weighted Normalized Unifrac
Problem size – Number of Samples
Runtime
–
In
Seconds
Runtime – In Seconds
UniFrac is a phylogenetic measure of beta-diversity
that assesses differences between pairs of microbiome
profiles. UniFrac is useful for microbial community
analysis because it can account for the evolutionary
relationships between microbes present within a sample.
Unifrac 0.20.2, often referred to as Hybrid Unifrac, can
run on either CPUs or GPUs. Most of the compute can
be performed using either integer or fp32 compute,
making it ideal for consumer-grade GPUs.
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin - University of California San Diego
The PRP is a distributed, Kubernetes based infrastructure
that specializes in providing access to consumer-grade
GPUs. We thus tested Hybrid Unifrac on several of the
available models, to assess the relative effectiveness of
the various models.
Two server-grade GPUs (NVIDIA A40 and A100) and
two CPUs (Intel Xeon Gold 6230 and AMD EPYC 7252)
have also been benchmarked on PRP for comparison.
Conclusion:
There is very little difference between a consumer-grade
RTX 3090 and the server-grade A40 and A100 for the
unweighted Unifrac. The A100 is however significantly
faster on the weighted normalized version; the A40 is
instead slightly slower than the 3090 there.
The older-generation RTX2080TI is also a strong
contender on smaller problems, while both the GTX GPUs
and the CPUs are significantly slower.
https://pacificresearchplatform.org
This work was partially funded by
the US National Research
Foundation (NSF) under grants
DBI-2038509, OAC-1826967,
OAC-1541349 and CNS-1730158.
Almost identical speed: RTX 3090, A40 and A100
Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070
Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252
Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40.
Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070
Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252
Problem size – Number of Samples
Runtime – In Seconds
At #samples = 100k
Server-grade NVIDIA GPU Server-grade CPU
Intel AMD
At #samples = 100k
https://github.com/biocore/unifrac
2. Comparing GPU effectiveness
for Unifrac distance compute
UniFrac is a phylogenetic measure of beta-
diversity that assesses differences between pairs
of microbiome profiles. UniFrac is useful for
microbial community analysis because it can
account for the evolutionary relationships
between microbes present within a sample.
Unifrac 0.20.2, often referred to as Hybrid
Unifrac, can run on either CPUs or GPUs. Most of
the compute can be performed using either integer
or fp32 compute, making it ideal for consumer-
grade GPUs.
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
The PRP is a distributed, Kubernetes based infrastructure that specializes in providing
access to consumer-grade GPUs. We thus tested Hybrid Unifrac on several of the
available models, to assess the relative effectiveness of the various models.
Two server-grade GPUs (NVIDIA A40 and A100) and two CPUs (Intel Xeon Gold
6230 and AMD EPYC 7252) have also been benchmarked on PRP for comparison.
https://pacificresearchplatform.org
https://github.com/biocore/unifrac
This work was partially funded by the US
National Research Foundation (NSF) under grants
DBI-2038509, OAC-1826967, OAC-1541349 and
CNS-1730158.
3. Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158.
Unweighted Unifrac Weighted Normalized Unifrac
Problem size – Number of Samples
Runtime
–
In
Seconds
Problem size – Number of Samples
Consumer-grade NVIDIA GPU
Server-grade NVIDIA GPU Server-grade CPU
Intel AMD
4. Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158.
Unweighted Unifrac Weighted Normalized Unifrac
Runtime – In Seconds
Almost identical speed: RTX 3090, A40 and A100
Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070
Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252
Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40.
Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070
Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252
Runtime – In Seconds
At #samples = 100k
Runtime – In Seconds
5. Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US
National Research Foundation (NSF) under grants
DBI-2038509, OAC-1826967, OAC-1541349 and
CNS-1730158.
Conclusion:
There is very little difference between a consumer-grade RTX 3090
and the server-grade A40 and A100 for the unweighted Unifrac.
The A100 is however significantly faster on the weighted
normalized version; the A40 is instead slightly slower than the
3090 there.
The older-generation RTX2080TI is also a strong contender on
smaller problems, while both the GTX GPUs and the CPUs are
significantly slower.