SlideShare a Scribd company logo
Utility of Graphics Processing Units
for dense matrix calculations in
computing and inverting genomic
relationship matrices
Karin Meyer and Bruce Tier
Animal Genetics and Breeding Unit,
University of New England,
Armidale, Australia
AAABG 2013
GPUs for GRMs | Introduction
Computing in the genomics age
Requirements for mixed model analyses have changed!
Pre-genomics:
mixed model equations sparse (few non-zero elements)
– inverse of pedigree based relationship matrix (NRM)
concentrated on exploiting sparsity in computations
Now: Genomic relationship matrix (GRM)
dense (large proportion of elements non-zero)
require manipulation of large, dense matrices to
– compute the GRM
– invert the GRM
– predict breeding values in single-step approach
computationally demanding
exploit new hardware and software libraries
– computing in parallel
K. M. | 2 / 16
GPUs for GRMs | Introduction
Objectives
A first look to examine scope for
speeding up computations to calculate & invert GRM
using parallel computations
exploiting a Graphics Processing Unit (GPU)
“An excursion into the land of computer gaming,
modern compilers and software libraries”
K. M. | 3 / 16
GPUs for GRMs | Introduction
What is a GPU?
Graphical Processing Unit
initially: co-processor for computer gaming
– graphics card
– fast recalculation of pixel values
– remember: i387 math processor for i386
now: GP-GPU adapted to General Purpose computing
– hundreds to thousands of cores
– capable of double precision calculations
– turns desktop PC into personal super-computer
– but: limited memory (currently 6GB max.)
specialised interface for application programming
– CUDA (proprietary of NVIDIA) −→ Fortran interface
– OPENCL (C++ like)
K. M. | 4 / 16
GPUs for GRMs | Introduction
Tesla K20X
2688 cores
6 GB RAM
Peak: 3.95/1.31 Tflops (single/double precision)
K. M. | 5 / 16
GPUs for GRMs | Calculating the GRM
Memory needed: G = ZZ
Gigabytes – single precision matrices
n Gn×n Zn×s
s = 100K 500K 1000K
5 000 0.1 1.9 9.3 18.6
10 000 0.4 3.7 18.6 37.3
20 000 1.5 7.5 37.3 74.5
30 000 3.4 11.2 55.9 111.8
50 000 9.3 18.6 93.1 186.3
−→ break calculations into blocks
−→ use out-of-core storage
→ parallel computing literature
K. M. | 6 / 16
GPUs for GRMs | Calculating the GRM
Blockwise multiplication
All-in-one
=Z
Z
G
G = ZZ
Column-wise division
= + · · · +Z.1 · · · · · · Z.4
Z.1
· · ·
· · ·
Z.4
Z.1Z.1 Z.4Z.4
G =
k
Z.kZ.k
K. M. | 7 / 16
Z: n animals
× s SNPs
G: symmetric
sn(n + 1)/2
flops
GPUs for GRMs | Calculating the GRM
Blockwise multiplication - cont’
Row-wise division
=Zi.
Zj. Gij = Zi.Zj.
Row- and columnwise division
=
Gij =
k
ZikZjk
K. M. | 8 / 16
GPUs for GRMs | Inverting the GRM
Blockwise inversion
G =


GPP GPC GPT
GCP GCC GCT
GTP GTC GTT


1. Subdivide matrix into blocks
C current −→ choosen size
P previous
T trailing
2. Factor & invert chol(GCC)
3. Adjust GPP & GPC
4. ‘Absorb’ into GTT & GCT
5. Calculate inverse
6. Redefine C, P & T; repeat
Gauss-Jordan Elimination
GCC := chol(GCC)
GCC := G−1
CC
GPC := GPCGCC
GPP := GPP + GPCGPC
GCT := GCCGCT
GTT := GTT − GCTGCT
GPT := GPT − GPCGCT
GCT := −(GCCGCT)
GPC := GPCGCC
GCC := GCCGCC
−→ break matrix products into blocks as needed
K. M. | 9 / 16
GPUs for GRMs | Computing environment
Hardware
‘Standard’ multi-core CPU
– Intel I7-3930K
– 3.2 GHz, 12M cache
– 64GB RAM
– 6 cores
GPU
– Nvidia GTX 580
– 512 cores
– 3GB RAM
– capable of double precision calculations
→ high end gaming card, not GP-GPU
K. M. | 10 / 16
GPUs for GRMs | Computing environment
Software
Linux (CentOS 6)
gfortran, gcc 4.4.7; Intel composer XE 13.0.1
Intel MKL library 11.0
– BLAS: SSYRK, SGEMM, STRTRI, STRMM
– LAPACK: SPOTRF, SPOTRI
CUDA 5.0
– CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM,
CUBLAS_STRTRI, CUBLAS_STRMM
CULA dense R16a
– CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI
MAGMA 1.3
– MAGMAF_SLAUUM
K. M. | 11 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
●
● ●
●
● ●
●
●
●
●
●
● ● ● ●
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●
GPU / CPU1
CPU6 / CPU1
5
10
15
20
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●●●
●
●
●●●●●●●●
●●●●●●
●
●●●●●●●●
●
●●●●
●●
●
●●●●
●●
●●●●●●●●●
●
GPU / CPU1
CPU6 / CPU1
5
10
15
GPUs for GRMs | Results
Time to invert GRM‡
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●
●
●
●
● ●
1 hour
GPU
CPU6
CPU1
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
‡
Double precision calculations
K. M. | 14 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
● ●
●
● ● ●
● ●
●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
●
●
●
●
●
●
●
●
GPU / CPU1
CPU6 / CPU1
4
6
8
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
GPU: powerful new hardware for parallel computing
– thousands of processors
– can use multiple GPUs simultaneously
– BLAS/LAPACK etc. libraries available
– but: limited memory −→ blocked algorithms
– but: specialised interface, not much Fortran support yet
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
K. M. | 16 / 16

More Related Content

What's hot

GPU
GPUGPU
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
Edge AI and Vision Alliance
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
Chakkrit (Kla) Tantithamthavorn
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
Gao Boyang
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
QuEST Global (erstwhile NeST Software)
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Kohei KaiGai
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
Rohit Khatana
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
Rajiv Kumar
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Beniamino Murgante
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
Fatima Qayyum
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
Shashwat Shriparv
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
SepidehShirkhanzadeh
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
Koichi Shirahata
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
s.rohit
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
rerngvit yanggratoke
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroups
Tsung-en Hsiao
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Heechul Yun
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing Units
Amrik Sadhra
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
Fisnik Kraja
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
Zalando adtech lab
 

What's hot (20)

GPU
GPUGPU
GPU
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroups
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing Units
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
 

Viewers also liked

Hipertensi¢n arterial
Hipertensi¢n arterialHipertensi¢n arterial
Hipertensi¢n arterial
Clariza Manrique
 
7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterolmegan508
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònmaxwell700
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์
nuttavorn kesornsiri
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Hiroaki Wagatsuma
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano Bicocca
Giulia Caliandro
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmsung379
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilar
htcuysl13
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmkraig109
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...
prettygully
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònmitch422
 
ISA 2015
ISA  2015 ISA  2015
ISA 2015
Mariely Cuevas
 
The pitch
The pitchThe pitch
The pitch
Terence Smith
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?
Selena Norcini
 

Viewers also liked (15)

Hipertensi¢n arterial
Hipertensi¢n arterialHipertensi¢n arterial
Hipertensi¢n arterial
 
7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
 
Com59
Com59Com59
Com59
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano Bicocca
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcm
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilar
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcm
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
 
ISA 2015
ISA  2015 ISA  2015
ISA 2015
 
The pitch
The pitchThe pitch
The pitch
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?
 

Similar to Slides meyer116

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
fcassier
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
Kohei KaiGai
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
John Holden
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
nadikari123
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
HBaseCon
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
Mail.ru Group
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
Open-NFP
 
GPU
GPUGPU
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
inside-BigData.com
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
NECST Lab @ Politecnico di Milano
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
Rohit Jnagal
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
Feras Daoud
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
ssuser30e7d2
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
Amal R
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Univa, an Altair Company
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
Jungsoo Nam
 
Hotspot gc
Hotspot gcHotspot gc
Hotspot gc
Michał Warecki
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Tarik Reza Toha
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Preferred Networks
 

Similar to Slides meyer116 (20)

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
GPU
GPUGPU
GPU
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Hotspot gc
Hotspot gcHotspot gc
Hotspot gc
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 

Recently uploaded

Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
AbdullaAlAsif1
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
LengamoLAppostilic
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
MaheshaNanjegowda
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 

Recently uploaded (20)

Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdfwaterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
waterlessdyeingtechnolgyusing carbon dioxide chemicalspdf
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Basics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different formsBasics of crystallography, crystal systems, classes and different forms
Basics of crystallography, crystal systems, classes and different forms
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 

Slides meyer116

  • 1. Utility of Graphics Processing Units for dense matrix calculations in computing and inverting genomic relationship matrices Karin Meyer and Bruce Tier Animal Genetics and Breeding Unit, University of New England, Armidale, Australia AAABG 2013
  • 2. GPUs for GRMs | Introduction Computing in the genomics age Requirements for mixed model analyses have changed! Pre-genomics: mixed model equations sparse (few non-zero elements) – inverse of pedigree based relationship matrix (NRM) concentrated on exploiting sparsity in computations Now: Genomic relationship matrix (GRM) dense (large proportion of elements non-zero) require manipulation of large, dense matrices to – compute the GRM – invert the GRM – predict breeding values in single-step approach computationally demanding exploit new hardware and software libraries – computing in parallel K. M. | 2 / 16
  • 3. GPUs for GRMs | Introduction Objectives A first look to examine scope for speeding up computations to calculate & invert GRM using parallel computations exploiting a Graphics Processing Unit (GPU) “An excursion into the land of computer gaming, modern compilers and software libraries” K. M. | 3 / 16
  • 4. GPUs for GRMs | Introduction What is a GPU? Graphical Processing Unit initially: co-processor for computer gaming – graphics card – fast recalculation of pixel values – remember: i387 math processor for i386 now: GP-GPU adapted to General Purpose computing – hundreds to thousands of cores – capable of double precision calculations – turns desktop PC into personal super-computer – but: limited memory (currently 6GB max.) specialised interface for application programming – CUDA (proprietary of NVIDIA) −→ Fortran interface – OPENCL (C++ like) K. M. | 4 / 16
  • 5. GPUs for GRMs | Introduction Tesla K20X 2688 cores 6 GB RAM Peak: 3.95/1.31 Tflops (single/double precision) K. M. | 5 / 16
  • 6. GPUs for GRMs | Calculating the GRM Memory needed: G = ZZ Gigabytes – single precision matrices n Gn×n Zn×s s = 100K 500K 1000K 5 000 0.1 1.9 9.3 18.6 10 000 0.4 3.7 18.6 37.3 20 000 1.5 7.5 37.3 74.5 30 000 3.4 11.2 55.9 111.8 50 000 9.3 18.6 93.1 186.3 −→ break calculations into blocks −→ use out-of-core storage → parallel computing literature K. M. | 6 / 16
  • 7. GPUs for GRMs | Calculating the GRM Blockwise multiplication All-in-one =Z Z G G = ZZ Column-wise division = + · · · +Z.1 · · · · · · Z.4 Z.1 · · · · · · Z.4 Z.1Z.1 Z.4Z.4 G = k Z.kZ.k K. M. | 7 / 16 Z: n animals × s SNPs G: symmetric sn(n + 1)/2 flops
  • 8. GPUs for GRMs | Calculating the GRM Blockwise multiplication - cont’ Row-wise division =Zi. Zj. Gij = Zi.Zj. Row- and columnwise division = Gij = k ZikZjk K. M. | 8 / 16
  • 9. GPUs for GRMs | Inverting the GRM Blockwise inversion G =   GPP GPC GPT GCP GCC GCT GTP GTC GTT   1. Subdivide matrix into blocks C current −→ choosen size P previous T trailing 2. Factor & invert chol(GCC) 3. Adjust GPP & GPC 4. ‘Absorb’ into GTT & GCT 5. Calculate inverse 6. Redefine C, P & T; repeat Gauss-Jordan Elimination GCC := chol(GCC) GCC := G−1 CC GPC := GPCGCC GPP := GPP + GPCGPC GCT := GCCGCT GTT := GTT − GCTGCT GPT := GPT − GPCGCT GCT := −(GCCGCT) GPC := GPCGCC GCC := GCCGCC −→ break matrix products into blocks as needed K. M. | 9 / 16
  • 10. GPUs for GRMs | Computing environment Hardware ‘Standard’ multi-core CPU – Intel I7-3930K – 3.2 GHz, 12M cache – 64GB RAM – 6 cores GPU – Nvidia GTX 580 – 512 cores – 3GB RAM – capable of double precision calculations → high end gaming card, not GP-GPU K. M. | 10 / 16
  • 11. GPUs for GRMs | Computing environment Software Linux (CentOS 6) gfortran, gcc 4.4.7; Intel composer XE 13.0.1 Intel MKL library 11.0 – BLAS: SSYRK, SGEMM, STRTRI, STRMM – LAPACK: SPOTRF, SPOTRI CUDA 5.0 – CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM, CUBLAS_STRTRI, CUBLAS_STRMM CULA dense R16a – CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI MAGMA 1.3 – MAGMAF_SLAUUM K. M. | 11 / 16
  • 12. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16
  • 13. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 5 10 15 20
  • 14. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16
  • 15. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●●● ●●●● ● ● ●●●●●●●● ●●●●●● ● ●●●●●●●● ● ●●●● ●● ● ●●●● ●● ●●●●●●●●● ● GPU / CPU1 CPU6 / CPU1 5 10 15
  • 16. GPUs for GRMs | Results Time to invert GRM‡ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) ‡ Double precision calculations K. M. | 14 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 4 6 8
  • 17. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable K. M. | 15 / 16
  • 18. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions K. M. | 15 / 16
  • 19. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions GPU: powerful new hardware for parallel computing – thousands of processors – can use multiple GPUs simultaneously – BLAS/LAPACK etc. libraries available – but: limited memory −→ blocked algorithms – but: specialised interface, not much Fortran support yet K. M. | 15 / 16
  • 20. GPUs for GRMs | Results | Conclusions K. M. | 16 / 16