SlideShare a Scribd company logo
1 of 20
Download to read offline
Utility of Graphics Processing Units
for dense matrix calculations in
computing and inverting genomic
relationship matrices
Karin Meyer and Bruce Tier
Animal Genetics and Breeding Unit,
University of New England,
Armidale, Australia
AAABG 2013
GPUs for GRMs | Introduction
Computing in the genomics age
Requirements for mixed model analyses have changed!
Pre-genomics:
mixed model equations sparse (few non-zero elements)
– inverse of pedigree based relationship matrix (NRM)
concentrated on exploiting sparsity in computations
Now: Genomic relationship matrix (GRM)
dense (large proportion of elements non-zero)
require manipulation of large, dense matrices to
– compute the GRM
– invert the GRM
– predict breeding values in single-step approach
computationally demanding
exploit new hardware and software libraries
– computing in parallel
K. M. | 2 / 16
GPUs for GRMs | Introduction
Objectives
A first look to examine scope for
speeding up computations to calculate & invert GRM
using parallel computations
exploiting a Graphics Processing Unit (GPU)
“An excursion into the land of computer gaming,
modern compilers and software libraries”
K. M. | 3 / 16
GPUs for GRMs | Introduction
What is a GPU?
Graphical Processing Unit
initially: co-processor for computer gaming
– graphics card
– fast recalculation of pixel values
– remember: i387 math processor for i386
now: GP-GPU adapted to General Purpose computing
– hundreds to thousands of cores
– capable of double precision calculations
– turns desktop PC into personal super-computer
– but: limited memory (currently 6GB max.)
specialised interface for application programming
– CUDA (proprietary of NVIDIA) −→ Fortran interface
– OPENCL (C++ like)
K. M. | 4 / 16
GPUs for GRMs | Introduction
Tesla K20X
2688 cores
6 GB RAM
Peak: 3.95/1.31 Tflops (single/double precision)
K. M. | 5 / 16
GPUs for GRMs | Calculating the GRM
Memory needed: G = ZZ
Gigabytes – single precision matrices
n Gn×n Zn×s
s = 100K 500K 1000K
5 000 0.1 1.9 9.3 18.6
10 000 0.4 3.7 18.6 37.3
20 000 1.5 7.5 37.3 74.5
30 000 3.4 11.2 55.9 111.8
50 000 9.3 18.6 93.1 186.3
−→ break calculations into blocks
−→ use out-of-core storage
→ parallel computing literature
K. M. | 6 / 16
GPUs for GRMs | Calculating the GRM
Blockwise multiplication
All-in-one
=Z
Z
G
G = ZZ
Column-wise division
= + · · · +Z.1 · · · · · · Z.4
Z.1
· · ·
· · ·
Z.4
Z.1Z.1 Z.4Z.4
G =
k
Z.kZ.k
K. M. | 7 / 16
Z: n animals
× s SNPs
G: symmetric
sn(n + 1)/2
flops
GPUs for GRMs | Calculating the GRM
Blockwise multiplication - cont’
Row-wise division
=Zi.
Zj. Gij = Zi.Zj.
Row- and columnwise division
=
Gij =
k
ZikZjk
K. M. | 8 / 16
GPUs for GRMs | Inverting the GRM
Blockwise inversion
G =


GPP GPC GPT
GCP GCC GCT
GTP GTC GTT


1. Subdivide matrix into blocks
C current −→ choosen size
P previous
T trailing
2. Factor & invert chol(GCC)
3. Adjust GPP & GPC
4. ‘Absorb’ into GTT & GCT
5. Calculate inverse
6. Redefine C, P & T; repeat
Gauss-Jordan Elimination
GCC := chol(GCC)
GCC := G−1
CC
GPC := GPCGCC
GPP := GPP + GPCGPC
GCT := GCCGCT
GTT := GTT − GCTGCT
GPT := GPT − GPCGCT
GCT := −(GCCGCT)
GPC := GPCGCC
GCC := GCCGCC
−→ break matrix products into blocks as needed
K. M. | 9 / 16
GPUs for GRMs | Computing environment
Hardware
‘Standard’ multi-core CPU
– Intel I7-3930K
– 3.2 GHz, 12M cache
– 64GB RAM
– 6 cores
GPU
– Nvidia GTX 580
– 512 cores
– 3GB RAM
– capable of double precision calculations
→ high end gaming card, not GP-GPU
K. M. | 10 / 16
GPUs for GRMs | Computing environment
Software
Linux (CentOS 6)
gfortran, gcc 4.4.7; Intel composer XE 13.0.1
Intel MKL library 11.0
– BLAS: SSYRK, SGEMM, STRTRI, STRMM
– LAPACK: SPOTRF, SPOTRI
CUDA 5.0
– CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM,
CUBLAS_STRTRI, CUBLAS_STRMM
CULA dense R16a
– CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI
MAGMA 1.3
– MAGMAF_SLAUUM
K. M. | 11 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
GPUs for GRMs | Results
Time to ‘make’ GRM∗
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
GPU
CPU6
CPU1
1 hour
5 hours
10−1
100
101
102
103
0 20000 40000 60000 80000
No. animals
Systemtime(min)
∗
Single precision calculations, s = 500 000
K. M. | 12 / 16
●
● ●
●
● ●
●
●
●
●
●
● ● ● ●
●
●
● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ●
● ●
GPU / CPU1
CPU6 / CPU1
5
10
15
20
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
GPUs for GRMs | Results
Time to invert GRM†
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
1 hour
GPU
CPU6
CPU1
10−4
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
†
Single precision calculations
K. M. | 13 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●●●
●●●●
●
●
●●●●●●●●
●●●●●●
●
●●●●●●●●
●
●●●●
●●
●
●●●●
●●
●●●●●●●●●
●
GPU / CPU1
CPU6 / CPU1
5
10
15
GPUs for GRMs | Results
Time to invert GRM‡
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●
● ● ●
●
●
●
● ●
1 hour
GPU
CPU6
CPU1
10−3
10−2
10−1
100
101
102
0 20000 40000 60000 80000
No. animals
Systemtime(min)
‡
Double precision calculations
K. M. | 14 / 16
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ● ● ●
●
●
● ●
●
● ● ●
● ●
●
●
● ●
● ●
●
● ● ● ● ●
● ● ● ● ● ●
●
●
● ● ● ● ● ●
● ● ●
●
●
●
●
●
●
●
●
GPU / CPU1
CPU6 / CPU1
4
6
8
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
Conclusions
Build & invert GRM
−→ dense matrix multiplications
−→ highly parallelisable
BLAS and LAPACK library routines awesome
– highly optimised
– exploit memory architecture of modern processors
– multi-threaded versions
GPU: powerful new hardware for parallel computing
– thousands of processors
– can use multiple GPUs simultaneously
– BLAS/LAPACK etc. libraries available
– but: limited memory −→ blocked algorithms
– but: specialised interface, not much Fortran support yet
K. M. | 15 / 16
GPUs for GRMs | Results | Conclusions
K. M. | 16 / 16

More Related Content

What's hot

"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...Edge AI and Vision Alliance
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Gao Boyang
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with GpuRohit Khatana
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overviewRajiv Kumar
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...Beniamino Murgante
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...Koichi Shirahata
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performances.rohit
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsTsung-en Hsiao
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsHeechul Yun
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing UnitsAmrik Sadhra
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Fisnik Kraja
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...Zalando adtech lab
 

What's hot (20)

GPU
GPUGPU
GPU
 
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ..."NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for ...
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Parallel computing with Gpu
Parallel computing with GpuParallel computing with Gpu
Parallel computing with Gpu
 
GPU Computing: A brief overview
GPU Computing: A brief overviewGPU Computing: A brief overview
GPU Computing: A brief overview
 
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...Parallel Simulation of Urban Dynamics on the GPU  Ivan Blečić, Arnaldo Cecchi...
Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchi...
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Graphics processing unit
Graphics processing unitGraphics processing unit
Graphics processing unit
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
A Scalable Implementation of a MapReduce-based Graph Processing Algorithm for...
 
improve deep learning training and inference performance
improve deep learning training and inference performanceimprove deep learning training and inference performance
improve deep learning training and inference performance
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Resource Management with Systemd and cgroups
Resource Management with Systemd and cgroupsResource Management with Systemd and cgroups
Resource Management with Systemd and cgroups
 
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC PlatformsProtecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
 
Graphics Processing Units
Graphics Processing UnitsGraphics Processing Units
Graphics Processing Units
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
06.09.2017 Computer Science, Machine Learning & Statistiks Meetup - MULTI-GPU...
 

Viewers also liked

7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterolmegan508
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònmaxwell700
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์nuttavorn kesornsiri
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Hiroaki Wagatsuma
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaGiulia Caliandro
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmsung379
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilarhtcuysl13
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmkraig109
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...prettygully
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònmitch422
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Selena Norcini
 

Viewers also liked (15)

Hipertensi¢n arterial
Hipertensi¢n arterialHipertensi¢n arterial
Hipertensi¢n arterial
 
7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol7 loại thực phẩm giúp giảm cholesterol
7 loại thực phẩm giúp giảm cholesterol
 
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gònnhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
nhận làm dịch vụ giúp việc nhà cao cấp ở sài gòn
 
Com59
Com59Com59
Com59
 
โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์โครงงานคอมพิวเตอร์
โครงงานคอมพิวเตอร์
 
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
Brain-Inspired Robotics and Neural Dynamics: Lecture 02 (2015)
 
Piano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano BicoccaPiano BI - Un'app per il quartiere di Milano Bicocca
Piano BI - Un'app per il quartiere di Milano Bicocca
 
cần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcmcần thuê dịch vụ giúp việc theo tháng ở tphcm
cần thuê dịch vụ giúp việc theo tháng ở tphcm
 
çİkolatali tatlilar
çİkolatali tatlilarçİkolatali tatlilar
çİkolatali tatlilar
 
nhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcmnhận làm dịch vụ giúp việc quận 6 tại hcm
nhận làm dịch vụ giúp việc quận 6 tại hcm
 
Penalized maximum likelihood estimates of genetic covariance matrices wit...
  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...  Penalized maximum likelihood  estimates of genetic covariance  matrices wit...
Penalized maximum likelihood estimates of genetic covariance matrices wit...
 
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gònnhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
nhận làm dịch vụ giúp việc theo giờ bảo đảm ở sài gòn
 
ISA 2015
ISA  2015 ISA  2015
ISA 2015
 
The pitch
The pitchThe pitch
The pitch
 
Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?Have you ever thought how the perfect packaging should be?
Have you ever thought how the perfect packaging should be?
 

Similar to Slides meyer116

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015Kohei KaiGai
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010John Holden
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modelingnadikari123
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUsSri Ambati
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneOpen-NFP
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - PresentationFeras Daoud
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxssuser30e7d2
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudUniva, an Altair Company
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Tarik Reza Toha
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionPreferred Networks
 

Similar to Slides meyer116 (20)

Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015PG-Strom - GPGPU meets PostgreSQL, PGcon2015
PG-Strom - GPGPU meets PostgreSQL, PGcon2015
 
Monte Carlo G P U Jan2010
Monte  Carlo  G P U  Jan2010Monte  Carlo  G P U  Jan2010
Monte Carlo G P U Jan2010
 
Statistical power consumption analysis and modeling
Statistical power consumption analysis and modelingStatistical power consumption analysis and modeling
Statistical power consumption analysis and modeling
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
GPU
GPUGPU
GPU
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
GPUrdma - Presentation
GPUrdma - PresentationGPUrdma - Presentation
GPUrdma - Presentation
 
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Hotspot gc
Hotspot gcHotspot gc
Hotspot gc
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
 
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place SolutionKaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th Place Solution
 

Recently uploaded

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Slides meyer116

  • 1. Utility of Graphics Processing Units for dense matrix calculations in computing and inverting genomic relationship matrices Karin Meyer and Bruce Tier Animal Genetics and Breeding Unit, University of New England, Armidale, Australia AAABG 2013
  • 2. GPUs for GRMs | Introduction Computing in the genomics age Requirements for mixed model analyses have changed! Pre-genomics: mixed model equations sparse (few non-zero elements) – inverse of pedigree based relationship matrix (NRM) concentrated on exploiting sparsity in computations Now: Genomic relationship matrix (GRM) dense (large proportion of elements non-zero) require manipulation of large, dense matrices to – compute the GRM – invert the GRM – predict breeding values in single-step approach computationally demanding exploit new hardware and software libraries – computing in parallel K. M. | 2 / 16
  • 3. GPUs for GRMs | Introduction Objectives A first look to examine scope for speeding up computations to calculate & invert GRM using parallel computations exploiting a Graphics Processing Unit (GPU) “An excursion into the land of computer gaming, modern compilers and software libraries” K. M. | 3 / 16
  • 4. GPUs for GRMs | Introduction What is a GPU? Graphical Processing Unit initially: co-processor for computer gaming – graphics card – fast recalculation of pixel values – remember: i387 math processor for i386 now: GP-GPU adapted to General Purpose computing – hundreds to thousands of cores – capable of double precision calculations – turns desktop PC into personal super-computer – but: limited memory (currently 6GB max.) specialised interface for application programming – CUDA (proprietary of NVIDIA) −→ Fortran interface – OPENCL (C++ like) K. M. | 4 / 16
  • 5. GPUs for GRMs | Introduction Tesla K20X 2688 cores 6 GB RAM Peak: 3.95/1.31 Tflops (single/double precision) K. M. | 5 / 16
  • 6. GPUs for GRMs | Calculating the GRM Memory needed: G = ZZ Gigabytes – single precision matrices n Gn×n Zn×s s = 100K 500K 1000K 5 000 0.1 1.9 9.3 18.6 10 000 0.4 3.7 18.6 37.3 20 000 1.5 7.5 37.3 74.5 30 000 3.4 11.2 55.9 111.8 50 000 9.3 18.6 93.1 186.3 −→ break calculations into blocks −→ use out-of-core storage → parallel computing literature K. M. | 6 / 16
  • 7. GPUs for GRMs | Calculating the GRM Blockwise multiplication All-in-one =Z Z G G = ZZ Column-wise division = + · · · +Z.1 · · · · · · Z.4 Z.1 · · · · · · Z.4 Z.1Z.1 Z.4Z.4 G = k Z.kZ.k K. M. | 7 / 16 Z: n animals × s SNPs G: symmetric sn(n + 1)/2 flops
  • 8. GPUs for GRMs | Calculating the GRM Blockwise multiplication - cont’ Row-wise division =Zi. Zj. Gij = Zi.Zj. Row- and columnwise division = Gij = k ZikZjk K. M. | 8 / 16
  • 9. GPUs for GRMs | Inverting the GRM Blockwise inversion G =   GPP GPC GPT GCP GCC GCT GTP GTC GTT   1. Subdivide matrix into blocks C current −→ choosen size P previous T trailing 2. Factor & invert chol(GCC) 3. Adjust GPP & GPC 4. ‘Absorb’ into GTT & GCT 5. Calculate inverse 6. Redefine C, P & T; repeat Gauss-Jordan Elimination GCC := chol(GCC) GCC := G−1 CC GPC := GPCGCC GPP := GPP + GPCGPC GCT := GCCGCT GTT := GTT − GCTGCT GPT := GPT − GPCGCT GCT := −(GCCGCT) GPC := GPCGCC GCC := GCCGCC −→ break matrix products into blocks as needed K. M. | 9 / 16
  • 10. GPUs for GRMs | Computing environment Hardware ‘Standard’ multi-core CPU – Intel I7-3930K – 3.2 GHz, 12M cache – 64GB RAM – 6 cores GPU – Nvidia GTX 580 – 512 cores – 3GB RAM – capable of double precision calculations → high end gaming card, not GP-GPU K. M. | 10 / 16
  • 11. GPUs for GRMs | Computing environment Software Linux (CentOS 6) gfortran, gcc 4.4.7; Intel composer XE 13.0.1 Intel MKL library 11.0 – BLAS: SSYRK, SGEMM, STRTRI, STRMM – LAPACK: SPOTRF, SPOTRI CUDA 5.0 – CUBLAS: CUBLAS_SSYRK, CUBLAS_SGEMM, CUBLAS_STRTRI, CUBLAS_STRMM CULA dense R16a – CULA_SPOTRF, CULA_SPOTRI, CULA_STRTRI MAGMA 1.3 – MAGMAF_SLAUUM K. M. | 11 / 16
  • 12. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16
  • 13. GPUs for GRMs | Results Time to ‘make’ GRM∗ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU CPU6 CPU1 1 hour 5 hours 10−1 100 101 102 103 0 20000 40000 60000 80000 No. animals Systemtime(min) ∗ Single precision calculations, s = 500 000 K. M. | 12 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 5 10 15 20
  • 14. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16
  • 15. GPUs for GRMs | Results Time to invert GRM† ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−4 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) † Single precision calculations K. M. | 13 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●●●● ●●●● ● ● ●●●●●●●● ●●●●●● ● ●●●●●●●● ● ●●●● ●● ● ●●●● ●● ●●●●●●●●● ● GPU / CPU1 CPU6 / CPU1 5 10 15
  • 16. GPUs for GRMs | Results Time to invert GRM‡ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● 1 hour GPU CPU6 CPU1 10−3 10−2 10−1 100 101 102 0 20000 40000 60000 80000 No. animals Systemtime(min) ‡ Double precision calculations K. M. | 14 / 16 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● GPU / CPU1 CPU6 / CPU1 4 6 8
  • 17. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable K. M. | 15 / 16
  • 18. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions K. M. | 15 / 16
  • 19. GPUs for GRMs | Results | Conclusions Conclusions Build & invert GRM −→ dense matrix multiplications −→ highly parallelisable BLAS and LAPACK library routines awesome – highly optimised – exploit memory architecture of modern processors – multi-threaded versions GPU: powerful new hardware for parallel computing – thousands of processors – can use multiple GPUs simultaneously – BLAS/LAPACK etc. libraries available – but: limited memory −→ blocked algorithms – but: specialised interface, not much Fortran support yet K. M. | 15 / 16
  • 20. GPUs for GRMs | Results | Conclusions K. M. | 16 / 16