SlideShare a Scribd company logo
1 of 5
Download to read offline
Comparing GPU effectiveness for Unifrac distance compute
Consumer-grade NVIDIA GPU
Unweighted Unifrac Weighted Normalized Unifrac
Problem size – Number of Samples
Runtime
–
In
Seconds
Runtime – In Seconds
UniFrac is a phylogenetic measure of beta-diversity
that assesses differences between pairs of microbiome
profiles. UniFrac is useful for microbial community
analysis because it can account for the evolutionary
relationships between microbes present within a sample.
Unifrac 0.20.2, often referred to as Hybrid Unifrac, can
run on either CPUs or GPUs. Most of the compute can
be performed using either integer or fp32 compute,
making it ideal for consumer-grade GPUs.
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin - University of California San Diego
The PRP is a distributed, Kubernetes based infrastructure
that specializes in providing access to consumer-grade
GPUs. We thus tested Hybrid Unifrac on several of the
available models, to assess the relative effectiveness of
the various models.
Two server-grade GPUs (NVIDIA A40 and A100) and
two CPUs (Intel Xeon Gold 6230 and AMD EPYC 7252)
have also been benchmarked on PRP for comparison.
Conclusion:
There is very little difference between a consumer-grade
RTX 3090 and the server-grade A40 and A100 for the
unweighted Unifrac. The A100 is however significantly
faster on the weighted normalized version; the A40 is
instead slightly slower than the 3090 there.
The older-generation RTX2080TI is also a strong
contender on smaller problems, while both the GTX GPUs
and the CPUs are significantly slower.
https://pacificresearchplatform.org
This work was partially funded by
the US National Research
Foundation (NSF) under grants
DBI-2038509, OAC-1826967,
OAC-1541349 and CNS-1730158.
Almost identical speed: RTX 3090, A40 and A100
Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070
Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252
Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40.
Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070
Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252
Problem size – Number of Samples
Runtime – In Seconds
At #samples = 100k
Server-grade NVIDIA GPU Server-grade CPU
Intel AMD
At #samples = 100k
https://github.com/biocore/unifrac
Comparing GPU effectiveness
for Unifrac distance compute
UniFrac is a phylogenetic measure of beta-
diversity that assesses differences between pairs
of microbiome profiles. UniFrac is useful for
microbial community analysis because it can
account for the evolutionary relationships
between microbes present within a sample.
Unifrac 0.20.2, often referred to as Hybrid
Unifrac, can run on either CPUs or GPUs. Most of
the compute can be performed using either integer
or fp32 compute, making it ideal for consumer-
grade GPUs.
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
The PRP is a distributed, Kubernetes based infrastructure that specializes in providing
access to consumer-grade GPUs. We thus tested Hybrid Unifrac on several of the
available models, to assess the relative effectiveness of the various models.
Two server-grade GPUs (NVIDIA A40 and A100) and two CPUs (Intel Xeon Gold
6230 and AMD EPYC 7252) have also been benchmarked on PRP for comparison.
https://pacificresearchplatform.org
https://github.com/biocore/unifrac
This work was partially funded by the US
National Research Foundation (NSF) under grants
DBI-2038509, OAC-1826967, OAC-1541349 and
CNS-1730158.
Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158.
Unweighted Unifrac Weighted Normalized Unifrac
Problem size – Number of Samples
Runtime
–
In
Seconds
Problem size – Number of Samples
Consumer-grade NVIDIA GPU
Server-grade NVIDIA GPU Server-grade CPU
Intel AMD
Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158.
Unweighted Unifrac Weighted Normalized Unifrac
Runtime – In Seconds
Almost identical speed: RTX 3090, A40 and A100
Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070
Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252
Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40.
Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070
Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252
Runtime – In Seconds
At #samples = 100k
Runtime – In Seconds
Comparing GPU effectiveness
for Unifrac distance compute
Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin
University of California San Diego
This work was partially funded by the US
National Research Foundation (NSF) under grants
DBI-2038509, OAC-1826967, OAC-1541349 and
CNS-1730158.
Conclusion:
There is very little difference between a consumer-grade RTX 3090
and the server-grade A40 and A100 for the unweighted Unifrac.
The A100 is however significantly faster on the weighted
normalized version; the A40 is instead slightly slower than the
3090 there.
The older-generation RTX2080TI is also a strong contender on
smaller problems, while both the GTX GPUs and the CPUs are
significantly slower.

More Related Content

Similar to Comparing GPU effectiveness for Unifrac distance compute

Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Stefano Di Carlo
 
Checkpointing the Un-checkpointable: MANA and the Split-Process Approach
Checkpointing the Un-checkpointable: MANA and the Split-Process ApproachCheckpointing the Un-checkpointable: MANA and the Split-Process Approach
Checkpointing the Un-checkpointable: MANA and the Split-Process Approach
inside-BigData.com
 
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
Deep Convolutional Neural Network acceleration on the Intel Xeon PhiDeep Convolutional Neural Network acceleration on the Intel Xeon Phi
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
Gaurav Raina
 
Deep Convolutional Network evaluation on the Intel Xeon Phi
Deep Convolutional Network evaluation on the Intel Xeon PhiDeep Convolutional Network evaluation on the Intel Xeon Phi
Deep Convolutional Network evaluation on the Intel Xeon Phi
Gaurav Raina
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2
Gaurav Raina
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
grid-vgpu-app-guide-esri-v01
grid-vgpu-app-guide-esri-v01grid-vgpu-app-guide-esri-v01
grid-vgpu-app-guide-esri-v01
Jason Kyungho Lee
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
Jason Kyungho Lee
 

Similar to Comparing GPU effectiveness for Unifrac distance compute (20)

Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
GUI overhead
GUI overheadGUI overhead
GUI overhead
 
Data Processing through Bio Sensors and Development of Simulation Software
Data Processing through Bio Sensors and Development of Simulation SoftwareData Processing through Bio Sensors and Development of Simulation Software
Data Processing through Bio Sensors and Development of Simulation Software
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
Checkpointing the Un-checkpointable: MANA and the Split-Process Approach
Checkpointing the Un-checkpointable: MANA and the Split-Process ApproachCheckpointing the Un-checkpointable: MANA and the Split-Process Approach
Checkpointing the Un-checkpointable: MANA and the Split-Process Approach
 
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
Deep Convolutional Neural Network acceleration on the Intel Xeon PhiDeep Convolutional Neural Network acceleration on the Intel Xeon Phi
Deep Convolutional Neural Network acceleration on the Intel Xeon Phi
 
Deep Convolutional Network evaluation on the Intel Xeon Phi
Deep Convolutional Network evaluation on the Intel Xeon PhiDeep Convolutional Network evaluation on the Intel Xeon Phi
Deep Convolutional Network evaluation on the Intel Xeon Phi
 
Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2Thesis Report - Gaurav Raina MSc ES - v2
Thesis Report - Gaurav Raina MSc ES - v2
 
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
A Dell Latitude 7420 laptop powered by a four-core Intel Core i7-1185G7 vPro ...
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
grid-vgpu-app-guide-esri-v01
grid-vgpu-app-guide-esri-v01grid-vgpu-app-guide-esri-v01
grid-vgpu-app-guide-esri-v01
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
Deep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server PlatformsDeep Dive On Intel Optane SSDs And New Server Platforms
Deep Dive On Intel Optane SSDs And New Server Platforms
 
TULIPP overview
TULIPP overviewTULIPP overview
TULIPP overview
 
Gui based debuggers
Gui based debuggers Gui based debuggers
Gui based debuggers
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Imaging automotive 2015 addfor v002
Imaging automotive 2015   addfor v002Imaging automotive 2015   addfor v002
Imaging automotive 2015 addfor v002
 
Inside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable CloudInside Microsoft's FPGA-Based Configurable Cloud
Inside Microsoft's FPGA-Based Configurable Cloud
 
dassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guidedassault-systemes-catia-application-scalability-guide
dassault-systemes-catia-application-scalability-guide
 

More from Igor Sfiligoi

Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
Igor Sfiligoi
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
Igor Sfiligoi
 

More from Igor Sfiligoi (20)

Preparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYROPreparing Fusion codes for Perlmutter - CGYRO
Preparing Fusion codes for Perlmutter - CGYRO
 
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
O&C Meeting - Evaluation of ARM CPUs for IceCube available through Google Kub...
 
Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...Comparing single-node and multi-node performance of an important fusion HPC c...
Comparing single-node and multi-node performance of an important fusion HPC c...
 
The anachronism of whole-GPU accounting
The anachronism of whole-GPU accountingThe anachronism of whole-GPU accounting
The anachronism of whole-GPU accounting
 
Auto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resourcesAuto-scaling HTCondor pools using Kubernetes compute resources
Auto-scaling HTCondor pools using Kubernetes compute resources
 
Speeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rateSpeeding up bowtie2 by improving cache-hit rate
Speeding up bowtie2 by improving cache-hit rate
 
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence SimulationsPerformance Optimization of CGYRO for Multiscale Turbulence Simulations
Performance Optimization of CGYRO for Multiscale Turbulence Simulations
 
Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...Managing Cloud networking costs for data-intensive applications by provisioni...
Managing Cloud networking costs for data-intensive applications by provisioni...
 
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory AccessAccelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
Accelerating Key Bioinformatics Tasks 100-fold by Improving Memory Access
 
Using A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific OutputUsing A100 MIG to Scale Astronomy Scientific Output
Using A100 MIG to Scale Astronomy Scientific Output
 
Using commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobsUsing commercial Clouds to process IceCube jobs
Using commercial Clouds to process IceCube jobs
 
Modest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYROModest scale HPC on Azure using CGYRO
Modest scale HPC on Azure using CGYRO
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
Scheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with AdmiraltyScheduling a Kubernetes Federation with Admiralty
Scheduling a Kubernetes Federation with Admiralty
 
Accelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACCAccelerating microbiome research with OpenACC
Accelerating microbiome research with OpenACC
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
Porting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUs
 
Demonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public CloudsDemonstrating 100 Gbps in and out of the public Clouds
Demonstrating 100 Gbps in and out of the public Clouds
 
TransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud linksTransAtlantic Networking using Cloud links
TransAtlantic Networking using Cloud links
 
Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...Bursting into the public Cloud - Sharing my experience doing it at large scal...
Bursting into the public Cloud - Sharing my experience doing it at large scal...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Comparing GPU effectiveness for Unifrac distance compute

  • 1. Comparing GPU effectiveness for Unifrac distance compute Consumer-grade NVIDIA GPU Unweighted Unifrac Weighted Normalized Unifrac Problem size – Number of Samples Runtime – In Seconds Runtime – In Seconds UniFrac is a phylogenetic measure of beta-diversity that assesses differences between pairs of microbiome profiles. UniFrac is useful for microbial community analysis because it can account for the evolutionary relationships between microbes present within a sample. Unifrac 0.20.2, often referred to as Hybrid Unifrac, can run on either CPUs or GPUs. Most of the compute can be performed using either integer or fp32 compute, making it ideal for consumer-grade GPUs. Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin - University of California San Diego The PRP is a distributed, Kubernetes based infrastructure that specializes in providing access to consumer-grade GPUs. We thus tested Hybrid Unifrac on several of the available models, to assess the relative effectiveness of the various models. Two server-grade GPUs (NVIDIA A40 and A100) and two CPUs (Intel Xeon Gold 6230 and AMD EPYC 7252) have also been benchmarked on PRP for comparison. Conclusion: There is very little difference between a consumer-grade RTX 3090 and the server-grade A40 and A100 for the unweighted Unifrac. The A100 is however significantly faster on the weighted normalized version; the A40 is instead slightly slower than the 3090 there. The older-generation RTX2080TI is also a strong contender on smaller problems, while both the GTX GPUs and the CPUs are significantly slower. https://pacificresearchplatform.org This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158. Almost identical speed: RTX 3090, A40 and A100 Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070 Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252 Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40. Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070 Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252 Problem size – Number of Samples Runtime – In Seconds At #samples = 100k Server-grade NVIDIA GPU Server-grade CPU Intel AMD At #samples = 100k https://github.com/biocore/unifrac
  • 2. Comparing GPU effectiveness for Unifrac distance compute UniFrac is a phylogenetic measure of beta- diversity that assesses differences between pairs of microbiome profiles. UniFrac is useful for microbial community analysis because it can account for the evolutionary relationships between microbes present within a sample. Unifrac 0.20.2, often referred to as Hybrid Unifrac, can run on either CPUs or GPUs. Most of the compute can be performed using either integer or fp32 compute, making it ideal for consumer- grade GPUs. Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin University of California San Diego The PRP is a distributed, Kubernetes based infrastructure that specializes in providing access to consumer-grade GPUs. We thus tested Hybrid Unifrac on several of the available models, to assess the relative effectiveness of the various models. Two server-grade GPUs (NVIDIA A40 and A100) and two CPUs (Intel Xeon Gold 6230 and AMD EPYC 7252) have also been benchmarked on PRP for comparison. https://pacificresearchplatform.org https://github.com/biocore/unifrac This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158.
  • 3. Comparing GPU effectiveness for Unifrac distance compute Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin University of California San Diego This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158. Unweighted Unifrac Weighted Normalized Unifrac Problem size – Number of Samples Runtime – In Seconds Problem size – Number of Samples Consumer-grade NVIDIA GPU Server-grade NVIDIA GPU Server-grade CPU Intel AMD
  • 4. Comparing GPU effectiveness for Unifrac distance compute Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin University of California San Diego This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158. Unweighted Unifrac Weighted Normalized Unifrac Runtime – In Seconds Almost identical speed: RTX 3090, A40 and A100 Relative slowdown compared to 3090: 1.5x 2080TI, 2.5x 1080TI, 4.5x 1070 Relative slowdown compared to 1070: 5x Xeon Gold 6230, 6x EPYC 7252 Relative slowdown compared to A100: 1.5x RTX 3090, 1.6x A40. Relative slowdown compared to 3090: 1.8x 2080TI, 3.6x 1080TI, 4.9x 1070 Relative slowdown compared to 1070: 3x Xeon Gold 6230, 5.5x EPYC 7252 Runtime – In Seconds At #samples = 100k Runtime – In Seconds
  • 5. Comparing GPU effectiveness for Unifrac distance compute Igor Sfiligoi, Rob Knight, Daniel McDonald, Tom DeFanti, Frank Würthwein, John Graham and Dima Mishin University of California San Diego This work was partially funded by the US National Research Foundation (NSF) under grants DBI-2038509, OAC-1826967, OAC-1541349 and CNS-1730158. Conclusion: There is very little difference between a consumer-grade RTX 3090 and the server-grade A40 and A100 for the unweighted Unifrac. The A100 is however significantly faster on the weighted normalized version; the A40 is instead slightly slower than the 3090 there. The older-generation RTX2080TI is also a strong contender on smaller problems, while both the GTX GPUs and the CPUs are significantly slower.